- Various virtual vs physical address usage fixes
- Add new bitwise types and helper functions and use them in s390 specific
drivers and code to make it easier to find virtual vs physical address
usage bugs. Right now virtual and physical addresses are identical for
s390, except for module, vmalloc, and similar areas. This will be
changed, hopefully with the next merge window, so that e.g. the kernel
image and modules will be located close to each other, allowing for
direct branches and also for some other simplifications.
As a prerequisite this requires to fix all misuses of virtual and
physical addresses. As it turned out people are so used to the concept
that virtual and physical addresses are the same, that new bugs got added
to code which was already fixed. In order to avoid that even more code
gets merged which adds such bugs add and use new bitwise types, so that
sparse can be used to find such usage bugs.
Most likely the new types can go away again after some time
- Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation
- Fix kprobe branch handling: if an out-of-line single stepped relative
branch instruction has a target address within a certain address area in
the entry code, the program check handler may incorrectly execute cleanup
code as if KVM code was executed, leading to crashes
- Fix reference counting of zcrypt card objects
- Various other small fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmX5lIYACgkQIg7DeRsp
bsJYxA//cYGSaopFLuHxQndQi5UcMx0NMcaD9auMyDraPWJH0/F+vIWnLPxgJ1wB
zfav3Ssvdd/41rPIXHSGxUzXzv+EHCY5+91t4plfCxWtd1SbpJ8gG2vrXTH512ql
RhJ7crRFQqoigIxlsdwUkwGSqd+L6H73ikKzQyAaJFKC/e9BEpCH4zb8NuTqCeJE
a81/A8BGSIo4Fk4qj1T5bHZzkznmxtisMPGXzoKa28LKhzwqqbGZYeHohnkaT6co
UpY/HMdhGH74kkKqpF0cLDI0Bd8ptfjbcVcKyJ8OD7vCfinIM3bZdYg0KIzyRMhu
d49JDXctPiePXTCi8X+AJnhNYFGlHuDciEpTMzkw97kwhmCfAOW5UaAlBo8dJqat
zt76Cxbl3D+hYI7Wbs9lt2QsXso4/1fMMNQbu9pwyMKxHlCBVe+ZqNIl0gP8OAXZ
51pghcLlcwYNeYRkSzvfEhO9aeUsRg40O5UBCklVMRScrx7qie2wsllEFavb7vZd
Ejv89dvn0KtzYIHG+U5SQ9d0x1JL9qApVM06sCGZGsBM9r4hHFGa8x57Pr+51ZPs
j0qbr7iAWgCGXN8c3p/Nrev6eqBio81CpD9PWik7QpJS38EKkussqfPdQitgQpsi
7r0Sx51oBzyAVadmVAf8/flUrbJTvD3BdbkzN99W4BXyARJk7CI=
=Vh2x
-----END PGP SIGNATURE-----
Merge tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
- Various virtual vs physical address usage fixes
- Add new bitwise types and helper functions and use them in s390
specific drivers and code to make it easier to find virtual vs
physical address usage bugs.
Right now virtual and physical addresses are identical for s390,
except for module, vmalloc, and similar areas. This will be changed,
hopefully with the next merge window, so that e.g. the kernel image
and modules will be located close to each other, allowing for direct
branches and also for some other simplifications.
As a prerequisite this requires to fix all misuses of virtual and
physical addresses. As it turned out people are so used to the
concept that virtual and physical addresses are the same, that new
bugs got added to code which was already fixed. In order to avoid
that even more code gets merged which adds such bugs add and use new
bitwise types, so that sparse can be used to find such usage bugs.
Most likely the new types can go away again after some time
- Provide a simple ARCH_HAS_DEBUG_VIRTUAL implementation
- Fix kprobe branch handling: if an out-of-line single stepped relative
branch instruction has a target address within a certain address area
in the entry code, the program check handler may incorrectly execute
cleanup code as if KVM code was executed, leading to crashes
- Fix reference counting of zcrypt card objects
- Various other small fixes and cleanups
* tag 's390-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
s390/entry: compare gmap asce to determine guest/host fault
s390/entry: remove OUTSIDE macro
s390/entry: add CIF_SIE flag and remove sie64a() address check
s390/cio: use while (i--) pattern to clean up
s390/raw3270: make class3270 constant
s390/raw3270: improve raw3270_init() readability
s390/tape: make tape_class constant
s390/vmlogrdr: make vmlogrdr_class constant
s390/vmur: make vmur_class constant
s390/zcrypt: make zcrypt_class constant
s390/mm: provide simple ARCH_HAS_DEBUG_VIRTUAL support
s390/vfio_ccw_cp: use new address translation helpers
s390/iucv: use new address translation helpers
s390/ctcm: use new address translation helpers
s390/lcs: use new address translation helpers
s390/qeth: use new address translation helpers
s390/zfcp: use new address translation helpers
s390/tape: fix virtual vs physical address confusion
s390/3270: use new address translation helpers
s390/3215: use new address translation helpers
...
With the current implementation, there are some cornercases where
a host fault would be treated as a guest fault, for example
when the sie instruction causes a program check. Therefore store
the gmap asce in ptregs, and use that to compare the primary asce
from the fault instead of matching instruction addresses.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
With only one OUTSIDE user left, remove the macro and move the code
directly to the machine check handler. This has the advantage that
it is much easier to determine which registers are used.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When a program check, interrupt or machine check is triggered, the
PSW address is compared to a certain range of the sie64a() function
to figure out whether SIE was interrupted and a cleanup of SIE is
needed.
This doesn't work with kprobes: If kprobes probes an instruction, it
copies the instruction to the kprobes instruction page and overwrites the
original instruction with an undefind instruction (Opcode 00). When this
instruction is hit later, kprobes single-steps the instruction on the
kprobes_instruction page.
However, if this instruction is a relative branch instruction it will now
point to a different location in memory due to being moved to the kprobes
instruction page. If the new branch target points into sie64a() the kernel
assumes it interrupted SIE when processing the breakpoint and will crash
trying to access the SIE control block.
Instead of comparing the address, introduce a new CIF_SIE flag which
indicates whether SIE was interrupted.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series "mm:
zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is hotplugged
as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving policy
wherein we allocate memory across nodes in a weighted fashion rather
than uniformly. This is beneficial in heterogeneous memory environments
appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the process
has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown situations.
The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings" Ryan
Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's series
"Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page faults.
He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction test",
Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess in
our handling of DAX on archiecctures which have virtually aliasing data
caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides dramatic
improvements in worst-case mmap_lock hold times during certain
userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability improvements
in his series "Mitigate a vmap lock contention". It realizes a 12x
improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging of
large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages() to
an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which are
configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZfJpPQAKCRDdBJ7gKXxA
joxeAP9TrcMEuHnLmBlhIXkWbIR4+ki+pA3v+gNTlJiBhnfVSgD9G55t1aBaRplx
TMNhHfyiHYDTx/GAV9NXW84tasJSDgA=
=TG55
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
Current average steal timer calculation produces volatile and inflated
values. The only user of this value is KVM so far and it uses that to
decide whether or not to yield the vCPU which is seeing steal time.
KVM compares average steal timer to a threshold and if the threshold
is past then it does not allow CPU polling and yields it to host, else
it keeps the CPU by polling.
Since KVM's steal time threshold is very low by default (%10) it most
likely is not effected much by the bloated average steal timer values
because the operating region is pretty small. However there might be
new users in the future who might rely on this number. Fix average
steal timer calculation by changing the formula from:
avg_steal_timer = avg_steal_timer / 2 + steal_timer;
to the following:
avg_steal_timer = (avg_steal_timer + steal_timer) / 2;
This ensures that avg_steal_timer is actually a naive average of steal
timer values. It now closely follows steal timer values but of course
in a smoother manner.
Fixes: 152e9b8676 ("s390/vtime: steal time exponential moving average")
Signed-off-by: Mete Durlu <meted@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
As provided with commit cd4386a931 ("s390/cpcmd,vmcp: avoid GFP_DMA
allocations") the Diagnose Code 8 response buffer does not have to be
below 2GB.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
- Various virtual vs physical address usage fixes
- Fix error handling in Processor Activity Instrumentation device driver, and
export number of counters with a sysfs file
- Allow for multiple events when Processor Activity Instrumentation counters
are monitored in system wide sampling
- Change multiplier and shift values of the Time-of-Day clock source to improve
steering precision
- Remove a couple of unneeded GFP_DMA flags from allocations
- Disable mmap alignment if randomize_va_space is also disabled, to avoid a too
small heap
- Various changes to allow s390 to be compiled with LLVM=1, since ld.lld and
llvm-objcopy will have proper s390 support witch clang 19
- Add __uninitialized macro to Compiler Attributes. This is helpful with s390's
FPU code where some users have up to 520 byte stack frames. Clearing such
stack frames (if INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO is enabled)
before they are used contradicts the intention (performance improvement) of
such code sections.
- Convert switch_to() to an out-of-line function, and use the generic switch_to
header file
- Replace the usage of s390's debug feature with pr_debug() calls within the
zcrypt device driver
- Improve hotplug support of the Adjunct Processor device driver
- Improve retry handling in the zcrypt device driver
- Various changes to the in-kernel FPU code:
- Make in-kernel FPU sections preemptible
- Convert various larger inline assemblies and assembler files to C, mainly
by using singe instruction inline assemblies. This increases readability,
but also allows makes it easier to add proper instrumentation hooks
- Cleanup of the header files
- Provide fast variants of csum_partial() and csum_partial_copy_nocheck() based
on vector instructions
- Introduce and use a lock to synchronize accesses to zpci device data
structures to avoid inconsistent states caused by concurrent accesses
- Compile the kernel without -fPIE. This addresses the following problems if
the kernel is compiled with -fPIE:
- It uses dynamic symbols (.dynsym), for which the linker refuses to allow
more than 64k sections. This can break features which use
'-ffunction-sections' and '-fdata-sections', including kpatch-build and
function granular KASLR
- It unnecessarily uses GOT relocations, adding an extra layer of indirection
for many memory accesses
- Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
reported as globally shared
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmXu3jEACgkQIg7DeRsp
bsJC8A/9Gi9JSMKWpIDR4WE2MQGwP/PnYdEamtK6c9ewOjIR/UzRIyIM3J1pyV0L
RwL8k7EBuv3f7shTcwfPzZWlnAwNwqr1UdcafjFNtHTig50YtdP5fBL33frKHBrm
ATedlCjagojOuVbh1gB45WUgzjSSkPyn0vqwjjo4h6uEAQ35zMEWwCs5Hpajlkhi
GCdJaiBLJcnhT96QGurQdke+MsrpGCzeBVBnA0qopQEWaQo8OdiAJ1uMD2WKbgPR
817kNzvmE6nXnfd5JevYbaiLjK/HQUSw2dZUS6/fjuIrzTsZEUhSg4ECaprKXDg7
5qiVVPNg4WbJAp0SsB+w7c4U99VxhbS7IVHXju18GrXw6SSAupdxIo7R7YiaT8vC
YIXZ1uIQ4Vbts3w/UqWUczIl/ooQt2DdrWT5NDNA+84OlOM42rthzA3vznTWuPTb
U21R7cZmN++hAUjR6s4aO2LfS7HQdnKL8nvJW2y99qSfrOXm+M973W2pDhYEVXQh
ixQ/lxfQpbBT1yUGlquIErokCPB85VY6ZTdGu6Erziywf4CWGsT5CspyaQnX2KTJ
s4CpFPnilrW3OnxmIkrM+pNJDun1nnkGA388Xq1NEKX8Oe65OMXEFNCb0kAHQ1ua
vb6534Ib/iuPnxsGpz1sX9iRqtUd06aBovPcbwIvatHCSfkWws8=
=KZ31
-----END PGP SIGNATURE-----
Merge tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Various virtual vs physical address usage fixes
- Fix error handling in Processor Activity Instrumentation device
driver, and export number of counters with a sysfs file
- Allow for multiple events when Processor Activity Instrumentation
counters are monitored in system wide sampling
- Change multiplier and shift values of the Time-of-Day clock source to
improve steering precision
- Remove a couple of unneeded GFP_DMA flags from allocations
- Disable mmap alignment if randomize_va_space is also disabled, to
avoid a too small heap
- Various changes to allow s390 to be compiled with LLVM=1, since
ld.lld and llvm-objcopy will have proper s390 support witch clang 19
- Add __uninitialized macro to Compiler Attributes. This is helpful
with s390's FPU code where some users have up to 520 byte stack
frames. Clearing such stack frames (if INIT_STACK_ALL_PATTERN or
INIT_STACK_ALL_ZERO is enabled) before they are used contradicts the
intention (performance improvement) of such code sections.
- Convert switch_to() to an out-of-line function, and use the generic
switch_to header file
- Replace the usage of s390's debug feature with pr_debug() calls
within the zcrypt device driver
- Improve hotplug support of the Adjunct Processor device driver
- Improve retry handling in the zcrypt device driver
- Various changes to the in-kernel FPU code:
- Make in-kernel FPU sections preemptible
- Convert various larger inline assemblies and assembler files to
C, mainly by using singe instruction inline assemblies. This
increases readability, but also allows makes it easier to add
proper instrumentation hooks
- Cleanup of the header files
- Provide fast variants of csum_partial() and
csum_partial_copy_nocheck() based on vector instructions
- Introduce and use a lock to synchronize accesses to zpci device data
structures to avoid inconsistent states caused by concurrent accesses
- Compile the kernel without -fPIE. This addresses the following
problems if the kernel is compiled with -fPIE:
- It uses dynamic symbols (.dynsym), for which the linker refuses
to allow more than 64k sections. This can break features which
use '-ffunction-sections' and '-fdata-sections', including
kpatch-build and function granular KASLR
- It unnecessarily uses GOT relocations, adding an extra layer of
indirection for many memory accesses
- Fix shared_cpu_list for CPU private L2 caches, which incorrectly were
reported as globally shared
* tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (117 commits)
s390/tools: handle rela R_390_GOTPCDBL/R_390_GOTOFF64
s390/cache: prevent rebuild of shared_cpu_list
s390/crypto: remove retry loop with sleep from PAES pkey invocation
s390/pkey: improve pkey retry behavior
s390/zcrypt: improve zcrypt retry behavior
s390/zcrypt: introduce retries on in-kernel send CPRB functions
s390/ap: introduce mutex to lock the AP bus scan
s390/ap: rework ap_scan_bus() to return true on config change
s390/ap: clarify AP scan bus related functions and variables
s390/ap: rearm APQNs bindings complete completion
s390/configs: increase number of LOCKDEP_BITS
s390/vfio-ap: handle hardware checkstop state on queue reset operation
s390/pai: change sampling event assignment for PMU device driver
s390/boot: fix minor comment style damages
s390/boot: do not check for zero-termination relocation entry
s390/boot: make type of __vmlinux_relocs_64_start|end consistent
s390/boot: sanitize kaslr_adjust_relocs() function prototype
s390/boot: simplify GOT handling
s390: vmlinux.lds.S: fix .got.plt assertion
s390/boot: workaround current 'llvm-objdump -t -j ...' behavior
...
With commit 36bbc5b4ff ("cacheinfo: Allow early detection and population
of cache attributes") the shared cpu list for each cache level higher than
L1 is rebuilt even if the list already has been set up.
This is caused by the removal of the cpumask_empty() check within
cache_shared_cpu_map_setup().
However architectures can enforce that the shared cpu list is not rebuilt
by simply setting cpu_map_populated of the per cpu cache info structure to
true, which is also the fix for this problem.
Before:
$ cat /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list
0-7
After:
$ cat /sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list
1
Fixes: 36bbc5b4ff ("cacheinfo: Allow early detection and population of cache attributes")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently only one PAI sampling event can be created and active
at any one time. The PMU device drivers store a pointer to this
event in their data structures even when the event is created
for counting and the PMU device driver reference to this counting
event is never needed.
Change this and assign the pointer to the PMU device driver
only when a sampling event is created.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The end of GOT is calculated dynamically on boot. The size of GOT
is calculated on build from the start and end of GOT. Avoid both
calculations and use the end of GOT directly.
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Naresh reported this build error on linux-next:
s390x-linux-gnu-ld: Unexpected GOT/PLT entries detected!
make[3]: *** [/builds/linux/arch/s390/boot/Makefile:87:
arch/s390/boot/vmlinux.syms] Error 1
make[3]: Target 'arch/s390/boot/bzImage' not remade because of errors.
The reason for the build error is an incorrect/incomplete assertion which
checks the size of the .got.plt section. Similar to x86 the size is either
zero or 24 bytes (three entries).
See commit 262b5cae67 ("x86/boot/compressed: Move .got.plt entries out of
the .got section") for more details. The three reserved/additional entries
for s390 are described in chapter 3.2.2 of the s390x ABI [1] (thanks to
Andreas Krebbel for pointing this out!).
[1] https://github.com/IBM/s390x-abi/releases/download/v1.6.1/lzsabi_s390x.pdf
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYvWp8TY-fMEvc3UhoVtoR_eM5VsfHj3+n+kexcfJJ+Cvw@mail.gmail.com
Fixes: 30226853d6 ("s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Nathan reported below building error:
=====
$ curl -LSso .config https://git.alpinelinux.org/aports/plain/community/linux-edge/config-edge.armv7
$ make -skj"$(nproc)" ARCH=arm CROSS_COMPILE=arm-linux-gnueabi- olddefconfig all
..
arm-linux-gnueabi-ld: arch/arm/kernel/machine_kexec.o: in function `arch_crash_save_vmcoreinfo':
machine_kexec.c:(.text+0x488): undefined reference to `vmcoreinfo_append_str'
====
On architecutres, like arm, s390, ppc, sh, function
arch_crash_save_vmcoreinfo() is located in machine_kexec.c and it can
only be compiled in when CONFIG_KEXEC_CORE=y.
That's not right because arch_crash_save_vmcoreinfo() is used to export
arch specific vmcoreinfo. CONFIG_VMCORE_INFO is supposed to control its
compiling in. However, CONFIG_VMVCORE_INFO could be independent of
CONFIG_KEXEC_CORE, e.g CONFIG_PROC_KCORE=y will select CONFIG_VMVCORE_INFO.
Or CONFIG_KEXEC/CONFIG_KEXEC_FILE is set while CONFIG_CRASH_DUMP is
not set, it will report linking error.
So, on arm, s390, ppc and sh, move arch_crash_save_vmcoreinfo out to
a new file vmcore_info.c. Let CONFIG_VMCORE_INFO decide if compiling in
arch_crash_save_vmcoreinfo().
[akpm@linux-foundation.org: remove stray newlines at eof]
Link: https://lkml.kernel.org/r/20240129135033.157195-3-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Closes: https://lore.kernel.org/all/20240126045551.GA126645@dev-arch.thelio-3990X/T/#u
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Now crash codes under kernel/ folder has been split out from kexec
code, crash dumping can be separated from kexec reboot in config
items on s390 with some adjustments.
Here wrap up crash dumping codes with CONFIG_CRASH_DUMP ifdeffery.
Link: https://lkml.kernel.org/r/20240124051254.67105-10-bhe@redhat.com
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Pingfan Liu <piliu@redhat.com>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Michael Kelley <mhklinux@outlook.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There is already a generic union definition for vdso_data_store in the vdso
datapage header.
Use this definition to prevent code duplication.
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219153939.75719-8-anna-maria@linutronix.de
On s390, currently kernel uses the '-fPIE' compiler flag for compiling
vmlinux. This has a few problems:
- It uses dynamic symbols (.dynsym), for which the linker refuses to
allow more than 64k sections. This can break features which use
'-ffunction-sections' and '-fdata-sections', including kpatch-build
[1] and Function Granular KASLR.
- It unnecessarily uses GOT relocations, adding an extra layer of
indirection for many memory accesses.
Instead of using '-fPIE', resolve all the relocations at link time and
then manually adjust any absolute relocations (R_390_64) during boot.
This is done by first telling the linker to preserve all relocations
during the vmlinux link. (Note this is harmless: they are later
stripped in the vmlinux.bin link.)
Then use the 'relocs' tool to find all absolute relocations (R_390_64)
which apply to allocatable sections. The offsets of those relocations
are saved in a special section which is then used to adjust the
relocations during boot.
(Note: For some reason, Clang occasionally creates a GOT reference, even
without '-fPIE'. So Clang-compiled kernels have a GOT, which needs to
be adjusted.)
On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%.
[1] https://github.com/dynup/kpatch/issues/1284
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html
Compiler consideration:
Gcc recently implemented an optimization [2] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [3] in recent gcc versions. This option has to be used with
future gcc versions.
Older Clang lacks support for handling unaligned symbols generated
by kernel linker scripts when the kernel is built without -fPIE. However,
future versions of Clang will include support for the -munaligned-symbols
option. When the support is unavailable, compile the kernel with -fPIE
to maintain the existing behavior.
In addition to it:
move vmlinux.relocs to safe relocation
When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire
uncompressed vmlinux.bin is positioned in the bzImage decompressor
image at the default kernel LMA of 0x100000, enabling it to be executed
in-place. However, the size of .vmlinux.relocs could be large enough to
cause an overlap with the uncompressed kernel at the address 0x100000.
To address this issue, .vmlinux.relocs is positioned after the
.rodata.compressed in the bzImage. Nevertheless, in this configuration,
vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To
overcome that, move vmlinux.relocs to a safe location before clearing
.bss and handling relocs.
Compile warning fix from Sumanth Korikkar:
When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are
several warnings:
ld: warning: orphan section `.rela.iplt' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.head.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.init.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.rodata.cst8' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
Orphan sections are sections that exist in an object file but don't have
a corresponding output section in the final executable. ld raises a
warning when it identifies such sections.
Eliminate the warning by placing all .rela orphan sections in .rela.dyn
and raise an error when size of .rela.dyn is greater than zero. i.e.
Dont just neglect orphan sections.
This is similar to adjustment performed in x86, where kernel is built
with -fno-PIE.
commit 5354e84598 ("x86/build: Add asserts for unwanted sections")
[sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move
vmlinux.relocs to safe location]
[hca@linux.ibm.com: merged compile warning fix from Sumanth]
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Gcc recently implemented an optimization [1] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [2] in recent gcc versions.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html
However, when -munaligned-symbols is used in vdso code, it leads to the
following compilation error:
`.data.rel.ro.local' referenced in section `.text' of
arch/s390/kernel/vdso64/vdso64_generic.o: defined in discarded section
`.data.rel.ro.local' of arch/s390/kernel/vdso64/vdso64_generic.o
vdso linker script discards .data section to make it lightweight.
However, -munaligned-symbols in vdso object files references literal
pool and accesses _vdso_data. Hence, compile vdso code without
-munaligned-symbols. This means in the future, vdso code should deal
with alignment of newly introduced unaligned linker symbols.
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-2-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When an event is started, read the current value of the
PAI counter. This value is saved in event::hw.prev_count.
When an event is stopped, this value is subtracted from the current
value read out at event stop time. The difference is the delta
of this counter.
Simplify the logic and read the event value every time the event is
started. This scheme is identical to other device drivers.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When the PAI events ALL_CRYPTO or ALL_NNPA are created
for system wide sampling, all PAI counters are monitored.
On each process schedule out, the values of all PAI counters
are investigated. Non-zero values are saved in the event's ring
buffer as raw data. This scheme expects the start value of each counter
to be reset to zero after each read operation performed by the PAI
PMU device driver. This allows for only one active event at any one
time as it relies on the start value of counters to be reset to zero.
Create a save area for each installed PAI XXXX_ALL event and save all
PAI counter values in this save area. Instead of clearing the
PAI counter lowcore area to zero after each read operation,
copy them from the lowcore area to the event's save area at process
schedule out time.
The delta of each PAI counter is calculated by subtracting the
old counter's value stored in the event's save area from the current
value stored in the lowcore area.
With this scheme, mulitple events of the PAI counters XXXX_ALL
can be handled at the same time. This will be addressed in a
follow-on patch.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Provide several one instruction fpu inline assemebles and use them to
implement the bogomips calculation in C like style. This is more for
illustration purposes on how kernel fpu code can be written in C.
This has the advantage that the author only has to take care of the
floating point instructions, but doesn't need to take care of general
purpose register allocation (if needed), and the semantics of all other
instructions not related to fpu.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Convert those callers of csum_partial() to use the cksm instruction,
which are either very early or in critical paths, like panic/dump, so
they don't have to rely on a working kernel infrastructure, which will
be introduced with a subsequent patch.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The first invocation of kernel_fpu_begin() after switching from user to
kernel context will save all vector registers, even if only parts of the
vector registers are used within the kernel fpu context. Given that save
and restore of all vector registers is quite expensive change the current
approach in several ways:
- Instead of saving and restoring all user registers limit this to those
registers which are actually used within an kernel fpu context.
- On context switch save all remaining user fpu registers, so they can be
restored when the task is rescheduled.
- Saving user registers within kernel_fpu_begin() is done without disabling
and enabling interrupts - which also slightly reduces runtime. In worst
case (e.g. interrupt context uses the same registers) this may lead to
the situation that registers are saved several times, however the
assumption is that this will not happen frequently, so that the new
method is faster in nearly all cases.
- save_user_fpu_regs() can still be called from all contexts and saves all
(or all remaining) user registers to a tasks ufpu user fpu save area.
Overall this reduces the time required to save and restore the user fpu
context for nearly all cases.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The kernel_fpu structure has a quite large size of 520 bytes. In order to
reduce stack footprint introduce several kernel fpu structures with
different and also smaller sizes. This way every kernel fpu user must use
the correct variant. A compile time check verifies that the correct variant
is used.
There are several users which use only 16 instead of all 32 vector
registers. For those users the new kernel_fpu_16 structure with a size of
only 266 bytes can be used.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The anonymous union within struct fpu contains a floating point register
array and a vector register array. Given that the vector register is always
present remove the floating point register array. For configurations
without vector registers save the floating point register contents within
their corresponding vector register location.
This allows to remove the union, and also to simplify ptrace and perf code.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
KVM was the only user which modified the regs pointer in struct fpu. Remove
the pointer and convert the rest of the core fpu code to directly access
the save area embedded within struct fpu.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
KVM modifies the kernel fpu's regs pointer to its own area to implement its
custom version of preemtible kernel fpu context. With general support for
preemptible kernel fpu context there is no need for the extra complexity in
KVM code anymore.
Therefore convert KVM to a regular kernel fpu user. In particular this
means that all TIF_FPU checks can be removed, since the fpu register
context will never be changed by other kernel fpu users, and also the fpu
register context will be restored if a thread is preempted.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Make the kernel fpu context preemptible. Add another fpu structure to the
thread_struct, and use it to save and restore the kernel fpu context if its
task uses fpu registers when it is preempted.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Change type of fpu mask consistently from u32 to int. This is a
prerequisite to make the kernel fpu usage preemptible. Upcoming code
uses __atomic* ops which work with int pointers.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Rename save_fpu_regs(), load_fpu_regs(), and struct thread_struct's fpu
member to save_user_fpu_regs(), load_user_fpu_regs(), and ufpu. This way
the function and variable names reflect for which context they are supposed
to be used.
This large and trivial conversion is a prerequisite for making the kernel
fpu usage preemptible.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The FPU state, as represented by the CIF_FPU flag reflects the FPU state of
a task, not the CPU it is running on. Therefore convert the flag to a
regular TIF flag.
This removes the magic in switch_to() where a save_fpu_regs() call for the
currently (previous) running task sets the per-cpu CIF_FPU flag, which is
required to restore FPU register contents of the next task, when it returns
to user space.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Convert the rather large __kernel_fpu_begin()/__kernel_fpu_end() inline
assemblies to C. The C variant is much more readable, and this also allows
to get rid of the non-obvious usage of KERNEL_VXR_* constants within the
inline assemblies. E.g. "tmll %[m],6" correlates with the two bits set in
KERNEL_VXR_LOW. If the corresponding defines would be changed, the inline
assembles would break in a subtle way.
Therefore convert to C, use the proper defines, and allow the compiler to
generate code using the (hopefully) most efficient instructions.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Instead of open-coding vlm and vstm inline assemblies at several locations,
provide an fpu_* function for each instruction, and use them in the new
save_vx_regs() and load_vx_regs() helper functions.
Note that "O" and "R" inline assembly operand modifiers are used in order
to pass the displacement and base register of the memory operands to the
existing VLM and VSTM macros. The two operand modifiers are not available
for clang. Therefore provide two variants of each inline assembly.
The clang variant always uses and clobbers general purpose register 1, like
in the previous inline assemblies, so it can be used as base register with
a zero displacement. This generates slightly less efficient code, but can
be removed as soon as clang has support for the used operand modifiers.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Instead of open-coding lfpc, sfpc, and stfpc inline assemblies at
several locations, provide an fpu_* function for each instruction and
use the function instead.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Deduplicate the 64 ld and std inline assemblies. Provide an fpu inline
assembly for both instructions, and use them in the new save_fp_regs()
and load_fp_regs() helper functions.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The only user of sfpc_safe() needs to read the new fpc register value
from memory before it is set with sfpc.
Avoid this indirection and use lfpc, which reads the new value from
memory. Also add the "fpu_" prefix to have a common name space for fpu
related inline assemblies, and provide memory access instrumentation.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Move, rename, and merge the fpu and vx header files. This way fpu header
files have a consistent naming scheme (fpu*.h).
Also get rid of the fpu subdirectory and move header files to asm
directory, so that all fpu and vx header files can be found at the same
location.
Merge internal.h header file into other header files, since the internal
helpers are used at many locations. so those helper functions are really
not internal.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Address various checkpatch warnings, adjust whitespace, and try to increase
readability. This is just preparation, in order to avoid that subsequent
patches contain any distracting drive-by coding style changes.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use KERNEL_VXR_LOW instead of KERNEL_VXR_V0V7 for configurations without
vector registers in order to decide if floating point registers need to be
saved and restored.
Kernel FPU areas which use floating point registers are supposed to use the
KERNEL_FPR mask, however users may also open-code this and specify
KERNEL_VXR_V0V7 and/or KERNEL_VXR_V8V15. If only KERNEL_VXR_V8V15 is
specified floating point registers wouldn't be saved and restored. Improve
this and check for both bits.
There are currently no users where this would fix a bug.
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Remove the historic machine check handler code which validates registers.
Registers are automatically validated as part of the machine check handling
sequence (see Principles of Operation, Machine-Check Handling chapter,
Validation).
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.
/proc/iomem should report the physical address ranges, so use __pa_symbol()
for resource registration, similar to other architectures.
Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When linking vdso64.so.dbg with ld.lld, there is a warning about not
finding _start for the starting address:
ld.lld: warning: cannot find entry symbol _start; not setting start address
Fix this by removing the unused ENTRY in both vdso linker scripts. See
commit e247172854 ("powerpc/vdso: Remove unused ENTRY in linker
scripts"), which solved the same problem for powerpc, for further details.
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are some warnings around certain
ELF sections:
s390-linux-ld: warning: orphan section `.dynstr' from `arch/s390/kernel/head64.o' being placed in section `.dynstr'
s390-linux-ld: warning: orphan section `.dynamic' from `arch/s390/kernel/head64.o' being placed in section `.dynamic'
s390-linux-ld: warning: orphan section `.hash' from `arch/s390/kernel/head64.o' being placed in section `.hash'
s390-linux-ld: warning: orphan section `.gnu.hash' from `arch/s390/kernel/head64.o' being placed in section `.gnu.hash'
Explicitly keep those sections like other architectures when
CONFIG_RELOCATABLE is enabled, which is always true for s390.
[hca@linux.ibm.com: keep sections instead of discarding]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-4-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are a lot of warnings around the
GOT and PLT sections:
s390-linux-ld: warning: orphan section `.plt' from `arch/s390/kernel/head64.o' being placed in section `.plt'
s390-linux-ld: warning: orphan section `.got' from `arch/s390/kernel/head64.o' being placed in section `.got'
s390-linux-ld: warning: orphan section `.got.plt' from `arch/s390/kernel/head64.o' being placed in section `.got.plt'
s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/kernel/head64.o' being placed in section `.iplt'
s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/kernel/head64.o' being placed in section `.igot.plt'
s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/boot/head.o' being placed in section `.iplt'
s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/boot/head.o' being placed in section `.igot.plt'
s390-linux-ld: warning: orphan section `.got' from `arch/s390/boot/head.o' being placed in section `.got'
Currently, only the '.got' section is actually emitted in the final
binary. In a manner similar to other architectures, put the '.got'
section near the '.data' section and coalesce the PLT sections,
checking that the final section is zero sized, which is a safe/tested
approach versus full discard.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-3-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are a lot of warnings around
'.data.rel' sections:
s390-linux-ld: warning: orphan section `.data.rel' from `kernel/sched/build_utility.o' being placed in section `.data.rel'
s390-linux-ld: warning: orphan section `.data.rel.local' from `kernel/sched/build_utility.o' being placed in section `.data.rel.local'
s390-linux-ld: warning: orphan section `.data.rel.ro' from `kernel/sched/build_utility.o' being placed in section `.data.rel.ro'
s390-linux-ld: warning: orphan section `.data.rel.ro.local' from `kernel/sched/build_utility.o' being placed in section `.data.rel.ro.local'
Describe these in vmlinux.lds.S so there is no more warning and the
sections are placed consistently between linkers.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-2-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Move the switch_to() implementation to process.c and use the generic
switch_to.h header file instead, like some other architectures.
This addresses also the oddity that the old switch_to() implementation
assigns the return value of __switch_to() to 'prev' instead of 'last',
like it should.
Remove also all includes of switch_to.h from C files, except process.c.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
save_access_regs() and restore_access_regs() are only available by
including switch_to.h. This is done by a couple of C files, which have
nothing to do with switch_to(), but only need these functions.
Move both functions to a new header file and improve the implementation:
- Get rid of typedef
- Add memory access instrumentation support
- Use long displacement instructions lamy/stamy instead of lam/stam - all
current users end up with better code because of this
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Code sections in s390 specific kernel code which use floating point or
vector registers all come with a 520 byte stack variable to save already in
use registers, if required.
With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled this variable
will always be initialized on function entry in addition to saving register
contents, which contradicts the intention (performance improvement) of such
code sections.
Therefore provide a DECLARE_KERNEL_FPU_ONSTACK() macro which provides
struct kernel_fpu variables with an __uninitialized attribute, and convert
all existing code to use this.
This way only this specific type of stack variable will not be initialized,
regardless of config options.
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240205154844.3757121-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Now that the driver core can properly handle constant struct bus_type,
move the stp_subsys variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240204-bus_cleanup-s390_time-v1-1-d2120156982a@marliere.net
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
'-fPIC' as an option to the linker does not do what it seems like it
should. With ld.bfd, it is treated as '-f PIC', which does not make
sense based on the meaning of '-f':
-f SHLIB, --auxiliary SHLIB Auxiliary filter for shared object symbol table
When building with ld.lld (currently under review in a GitHub pull
request), it just errors out because '-f' means nothing and neither does
'-fPIC':
ld.lld: error: unknown argument '-fPIC'
'-fPIC' was blindly copied from CFLAGS when the vDSO stopped being
linked with '$(CC)', it should not be needed. Remove it to clear up the
build failure with ld.lld.
Fixes: 2b2a25845d ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO")
Link: https://github.com/llvm/llvm-project/pull/75643
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20240130-s390-vdso-drop-fpic-from-ldflags-v1-1-094ad104fc55@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
diag14() is currently only used by the vmur device driver. The third
parameter, called subcommand, determines the type of the first
parameter. For some subcommands the value of the first parameter is an
address to a memory buffer and needs virtual to physical address
conversion. Other subcommands interpret the first parameter is an
integer.
This doesn't fix a bug since virtual and physical addresses
are currently the same.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>