Commit Graph

3676 Commits

Author SHA1 Message Date
Thomas Richter
e22033fddd s390/pai: change sampling event assignment for PMU device driver
Currently only one PAI sampling event can be created and active
at any one time. The PMU device drivers store a pointer to this
event in their data structures even when the event is created
for counting and the PMU device driver reference to this counting
event is never needed.
Change this and assign the pointer to the PMU device driver
only when a sampling event is created.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-03-07 14:41:13 +01:00
Alexander Gordeev
3334fda639 s390/boot: simplify GOT handling
The end of GOT is calculated dynamically on boot. The size of GOT
is calculated on build from the start and end of GOT. Avoid both
calculations and use the end of GOT directly.

Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-26 10:25:09 +01:00
Heiko Carstens
a795e5d234 s390: vmlinux.lds.S: fix .got.plt assertion
Naresh reported this build error on linux-next:

s390x-linux-gnu-ld: Unexpected GOT/PLT entries detected!
make[3]: *** [/builds/linux/arch/s390/boot/Makefile:87:
arch/s390/boot/vmlinux.syms] Error 1
make[3]: Target 'arch/s390/boot/bzImage' not remade because of errors.

The reason for the build error is an incorrect/incomplete assertion which
checks the size of the .got.plt section. Similar to x86 the size is either
zero or 24 bytes (three entries).

See commit 262b5cae67 ("x86/boot/compressed: Move .got.plt entries out of
the .got section") for more details. The three reserved/additional entries
for s390 are described in chapter 3.2.2 of the s390x ABI [1] (thanks to
Andreas Krebbel for pointing this out!).

[1] https://github.com/IBM/s390x-abi/releases/download/v1.6.1/lzsabi_s390x.pdf

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Closes: https://lore.kernel.org/all/CA+G9fYvWp8TY-fMEvc3UhoVtoR_eM5VsfHj3+n+kexcfJJ+Cvw@mail.gmail.com
Fixes: 30226853d6 ("s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections")
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-25 18:01:09 +01:00
Josh Poimboeuf
778666df60 s390: compile relocatable kernel without -fPIE
On s390, currently kernel uses the '-fPIE' compiler flag for compiling
vmlinux.  This has a few problems:

  - It uses dynamic symbols (.dynsym), for which the linker refuses to
    allow more than 64k sections.  This can break features which use
    '-ffunction-sections' and '-fdata-sections', including kpatch-build
    [1] and Function Granular KASLR.

  - It unnecessarily uses GOT relocations, adding an extra layer of
    indirection for many memory accesses.

Instead of using '-fPIE', resolve all the relocations at link time and
then manually adjust any absolute relocations (R_390_64) during boot.

This is done by first telling the linker to preserve all relocations
during the vmlinux link.  (Note this is harmless: they are later
stripped in the vmlinux.bin link.)

Then use the 'relocs' tool to find all absolute relocations (R_390_64)
which apply to allocatable sections.  The offsets of those relocations
are saved in a special section which is then used to adjust the
relocations during boot.

(Note: For some reason, Clang occasionally creates a GOT reference, even
without '-fPIE'.  So Clang-compiled kernels have a GOT, which needs to
be adjusted.)

On my mostly-defconfig kernel, this reduces kernel text size by ~1.3%.

[1] https://github.com/dynup/kpatch/issues/1284
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html

Compiler consideration:

Gcc recently implemented an optimization [2] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [3] in recent gcc versions. This option has to be used with
future gcc versions.

Older Clang lacks support for handling unaligned symbols generated
by kernel linker scripts when the kernel is built without -fPIE. However,
future versions of Clang will include support for the -munaligned-symbols
option. When the support is unavailable, compile the kernel with -fPIE
to maintain the existing behavior.

In addition to it:
move vmlinux.relocs to safe relocation

When the kernel is built with CONFIG_KERNEL_UNCOMPRESSED, the entire
uncompressed vmlinux.bin is positioned in the bzImage decompressor
image at the default kernel LMA of 0x100000, enabling it to be executed
in-place. However, the size of .vmlinux.relocs could be large enough to
cause an overlap with the uncompressed kernel at the address 0x100000.
To address this issue, .vmlinux.relocs is positioned after the
.rodata.compressed in the bzImage. Nevertheless, in this configuration,
vmlinux.relocs will overlap with the .bss section of vmlinux.bin. To
overcome that, move vmlinux.relocs to a safe location before clearing
.bss and handling relocs.

Compile warning fix from Sumanth Korikkar:

When kernel is built with CONFIG_LD_ORPHAN_WARN and -fno-PIE, there are
several warnings:

ld: warning: orphan section `.rela.iplt' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.head.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.init.text' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'
ld: warning: orphan section `.rela.rodata.cst8' from
`arch/s390/kernel/head64.o' being placed in section `.rela.dyn'

Orphan sections are sections that exist in an object file but don't have
a corresponding output section in the final executable. ld raises a
warning when it identifies such sections.

Eliminate the warning by placing all .rela orphan sections in .rela.dyn
and raise an error when size of .rela.dyn is greater than zero. i.e.
Dont just neglect orphan sections.

This is similar to adjustment performed in x86, where kernel is built
with -fno-PIE.
commit 5354e84598 ("x86/build: Add asserts for unwanted sections")

[sumanthk@linux.ibm.com: rebased Josh Poimboeuf patches and move
 vmlinux.relocs to safe location]
[hca@linux.ibm.com: merged compile warning fix from Sumanth]
Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-4-sumanthk@linux.ibm.com
Link: https://lore.kernel.org/r/20240219132734.22881-5-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Sumanth Korikkar
8192a1b380 s390/vdso64: filter out munaligned-symbols flag for vdso
Gcc recently implemented an optimization [1] for loading symbols without
explicit alignment, aligning with the IBM Z ELF ABI. This ABI mandates
symbols to reside on a 2-byte boundary, enabling the use of the larl
instruction. However, kernel linker scripts may still generate unaligned
symbols. To address this, a new -munaligned-symbols option has been
introduced [2] in recent gcc versions.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622872.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/625986.html

However, when -munaligned-symbols  is used in vdso code, it leads to the
following compilation error:
`.data.rel.ro.local' referenced in section `.text' of
arch/s390/kernel/vdso64/vdso64_generic.o: defined in discarded section
`.data.rel.ro.local' of arch/s390/kernel/vdso64/vdso64_generic.o

vdso linker script discards .data section to make it lightweight.
However, -munaligned-symbols in vdso object files references literal
pool and accesses _vdso_data. Hence, compile vdso code without
-munaligned-symbols.  This means in the future, vdso code should deal
with alignment of newly introduced unaligned linker symbols.

Acked-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20240219132734.22881-2-sumanthk@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:33 +01:00
Thomas Richter
82cb9b6185 s390/pai: simplify event start function for perf stat
When an event is started, read the current value of the
PAI counter. This value is saved in event::hw.prev_count.
When an event is stopped, this value is subtracted from the current
value read out at event stop time. The difference is the delta
of this counter.

Simplify the logic and read the event value every time the event is
started. This scheme is identical to other device drivers.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:32 +01:00
Thomas Richter
fe861b0c8d s390/pai: save PAI counter value page in event structure
When the PAI events ALL_CRYPTO or ALL_NNPA are created
for system wide sampling, all PAI counters are monitored.
On each process schedule out, the values of all PAI counters
are investigated. Non-zero values are saved in the event's ring
buffer as raw data. This scheme expects the start value of each counter
to be reset to zero after each read operation performed by the PAI
PMU device driver. This allows for only one active event at any one
time as it relies on the start value of counters to be reset to zero.

Create a save area for each installed PAI XXXX_ALL event and save all
PAI counter values in this save area. Instead of clearing the
PAI counter lowcore area to zero after each read operation,
copy them from the lowcore area to the event's save area at process
schedule out time.
The delta of each PAI counter is calculated by subtracting the
old counter's value stored in the event's save area from the current
value stored in the lowcore area.

With this scheme, mulitple events of the PAI counters XXXX_ALL
can be handled at the same time. This will be addressed in a
follow-on patch.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-20 14:37:31 +01:00
Heiko Carstens
ea8b75d289 s390/sysinfo: convert bogomips calculation to C
Provide several one instruction fpu inline assemebles and use them to
implement the bogomips calculation in C like style. This is more for
illustration purposes on how kernel fpu code can be written in C.

This has the advantage that the author only has to take care of the
floating point instructions, but doesn't need to take care of general
purpose register allocation (if needed), and the semantics of all other
instructions not related to fpu.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
3a74f44de2 s390/checksum: provide and use cksm() inline assembly
Convert those callers of csum_partial() to use the cksm instruction,
which are either very early or in critical paths, like panic/dump, so
they don't have to rely on a working kernel infrastructure, which will
be introduced with a subsequent patch.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:17 +01:00
Heiko Carstens
8c09871a95 s390/fpu: limit save and restore to used registers
The first invocation of kernel_fpu_begin() after switching from user to
kernel context will save all vector registers, even if only parts of the
vector registers are used within the kernel fpu context. Given that save
and restore of all vector registers is quite expensive change the current
approach in several ways:

- Instead of saving and restoring all user registers limit this to those
  registers which are actually used within an kernel fpu context.

- On context switch save all remaining user fpu registers, so they can be
  restored when the task is rescheduled.

- Saving user registers within kernel_fpu_begin() is done without disabling
  and enabling interrupts - which also slightly reduces runtime. In worst
  case (e.g. interrupt context uses the same registers) this may lead to
  the situation that registers are saved several times, however the
  assumption is that this will not happen frequently, so that the new
  method is faster in nearly all cases.

- save_user_fpu_regs() can still be called from all contexts and saves all
  (or all remaining) user registers to a tasks ufpu user fpu save area.

Overall this reduces the time required to save and restore the user fpu
context for nearly all cases.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
066c40918b s390/fpu: decrease stack usage for some cases
The kernel_fpu structure has a quite large size of 520 bytes. In order to
reduce stack footprint introduce several kernel fpu structures with
different and also smaller sizes. This way every kernel fpu user must use
the correct variant. A compile time check verifies that the correct variant
is used.

There are several users which use only 16 instead of all 32 vector
registers. For those users the new kernel_fpu_16 structure with a size of
only 266 bytes can be used.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
bdbd3acb33 s390/fpu: remove anonymous union from struct fpu
The anonymous union within struct fpu contains a floating point register
array and a vector register array. Given that the vector register is always
present remove the floating point register array. For configurations
without vector registers save the floating point register contents within
their corresponding vector register location.

This allows to remove the union, and also to simplify ptrace and perf code.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
9cbff7f221 s390/fpu: remove regs member from struct fpu
KVM was the only user which modified the regs pointer in struct fpu. Remove
the pointer and convert the rest of the core fpu code to directly access
the save area embedded within struct fpu.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
ed3a0a011a s390/kvm: convert to regular kernel fpu user
KVM modifies the kernel fpu's regs pointer to its own area to implement its
custom version of preemtible kernel fpu context. With general support for
preemptible kernel fpu context there is no need for the extra complexity in
KVM code anymore.

Therefore convert KVM to a regular kernel fpu user. In particular this
means that all TIF_FPU checks can be removed, since the fpu register
context will never be changed by other kernel fpu users, and also the fpu
register context will be restored if a thread is preempted.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:16 +01:00
Heiko Carstens
4eed43de9b s390/fpu: make kernel fpu context preemptible
Make the kernel fpu context preemptible. Add another fpu structure to the
thread_struct, and use it to save and restore the kernel fpu context if its
task uses fpu registers when it is preempted.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
c038b984a9 s390/fpu: change type of fpu mask from u32 to int
Change type of fpu mask consistently from u32 to int. This is a
prerequisite to make the kernel fpu usage preemptible. Upcoming code
uses __atomic* ops which work with int pointers.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
87c5c70036 s390/fpu: rename save_fpu_regs() to save_user_fpu_regs(), etc
Rename save_fpu_regs(), load_fpu_regs(), and struct thread_struct's fpu
member to save_user_fpu_regs(), load_user_fpu_regs(), and ufpu. This way
the function and variable names reflect for which context they are supposed
to be used.

This large and trivial conversion is a prerequisite for making the kernel
fpu usage preemptible.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
419abc4d38 s390/fpu: convert FPU CIF flag to regular TIF flag
The FPU state, as represented by the CIF_FPU flag reflects the FPU state of
a task, not the CPU it is running on. Therefore convert the flag to a
regular TIF flag.

This removes the magic in switch_to() where a save_fpu_regs() call for the
currently (previous) running task sets the per-cpu CIF_FPU flag, which is
required to restore FPU register contents of the next task, when it returns
to user space.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
918c7cad66 s390/fpu: convert __kernel_fpu_begin()/__kernel_fpu_end() to C
Convert the rather large __kernel_fpu_begin()/__kernel_fpu_end() inline
assemblies to C. The C variant is much more readable, and this also allows
to get rid of the non-obvious usage of KERNEL_VXR_* constants within the
inline assemblies. E.g. "tmll %[m],6" correlates with the two bits set in
KERNEL_VXR_LOW. If the corresponding defines would be changed, the inline
assembles would break in a subtle way.

Therefore convert to C, use the proper defines, and allow the compiler to
generate code using the (hopefully) most efficient instructions.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
3a5866a001 s390/fpu: provide and use vlm and vstm inline assemblies
Instead of open-coding vlm and vstm inline assemblies at several locations,
provide an fpu_* function for each instruction, and use them in the new
save_vx_regs() and load_vx_regs() helper functions.

Note that "O" and "R" inline assembly operand modifiers are used in order
to pass the displacement and base register of the memory operands to the
existing VLM and VSTM macros. The two operand modifiers are not available
for clang. Therefore provide two variants of each inline assembly.

The clang variant always uses and clobbers general purpose register 1, like
in the previous inline assemblies, so it can be used as base register with
a zero displacement. This generates slightly less efficient code, but can
be removed as soon as clang has support for the used operand modifiers.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
f4e3de75d0 s390/fpu: provide and use lfpc, sfpc, and stfpc inline assemblies
Instead of open-coding lfpc, sfpc, and stfpc inline assemblies at
several locations, provide an fpu_* function for each instruction and
use the function instead.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:15 +01:00
Heiko Carstens
88d8136a08 s390/fpu: provide and use ld and std inline assemblies
Deduplicate the 64 ld and std inline assemblies. Provide an fpu inline
assembly for both instructions, and use them in the new save_fp_regs()
and load_fp_regs() helper functions.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:14 +01:00
Heiko Carstens
13a8a519ca s390/fpu: use lfpc instead of sfpc instruction
The only user of sfpc_safe() needs to read the new fpc register value
from memory before it is set with sfpc.

Avoid this indirection and use lfpc, which reads the new value from
memory. Also add the "fpu_" prefix to have a common name space for fpu
related inline assemblies, and provide memory access instrumentation.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:14 +01:00
Heiko Carstens
fd2527f209 s390/fpu: move, rename, and merge header files
Move, rename, and merge the fpu and vx header files. This way fpu header
files have a consistent naming scheme (fpu*.h).

Also get rid of the fpu subdirectory and move header files to asm
directory, so that all fpu and vx header files can be found at the same
location.

Merge internal.h header file into other header files, since the internal
helpers are used at many locations. so those helper functions are really
not internal.

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:14 +01:00
Heiko Carstens
31d3ec15dc s390/fpu: various coding style changes
Address various checkpatch warnings, adjust whitespace, and try to increase
readability. This is just preparation, in order to avoid that subsequent
patches contain any distracting drive-by coding style changes.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:14 +01:00
Heiko Carstens
b6b842becd s390/fpu: use KERNEL_VXR_LOW instead of KERNEL_VXR_V0V7
Use KERNEL_VXR_LOW instead of KERNEL_VXR_V0V7 for configurations without
vector registers in order to decide if floating point registers need to be
saved and restored.

Kernel FPU areas which use floating point registers are supposed to use the
KERNEL_FPR mask, however users may also open-code this and specify
KERNEL_VXR_V0V7 and/or KERNEL_VXR_V8V15. If only KERNEL_VXR_V8V15 is
specified floating point registers wouldn't be saved and restored. Improve
this and check for both bits.

There are currently no users where this would fix a bug.

Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:14 +01:00
Heiko Carstens
9e96afab8c s390/nmi: remove register validation code
Remove the historic machine check handler code which validates registers.
Registers are automatically validated as part of the machine check handling
sequence (see Principles of Operation, Machine-Check Handling chapter,
Validation).

Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:14 +01:00
Gerald Schaefer
e8054eaeb5 s390/setup: fix virtual vs physical address confusion
Fix virtual vs physical address confusion. This does not fix a bug
since virtual and physical address spaces are currently the same.

/proc/iomem should report the physical address ranges, so use __pa_symbol()
for resource registration, similar to other architectures.

Signed-off-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-16 14:30:12 +01:00
Heiko Carstens
616c4ea9bc s390/vdso: remove unused ENTRY in linker scripts
When linking vdso64.so.dbg with ld.lld, there is a warning about not
finding _start for the starting address:

  ld.lld: warning: cannot find entry symbol _start; not setting start address

Fix this by removing the unused ENTRY in both vdso linker scripts. See
commit e247172854 ("powerpc/vdso: Remove unused ENTRY in linker
scripts"), which solved the same problem for powerpc, for further details.

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:53 +01:00
Nathan Chancellor
a691c8a6ef s390: vmlinux.lds.S: explicitly keep various sections
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are some warnings around certain
ELF sections:

  s390-linux-ld: warning: orphan section `.dynstr' from `arch/s390/kernel/head64.o' being placed in section `.dynstr'
  s390-linux-ld: warning: orphan section `.dynamic' from `arch/s390/kernel/head64.o' being placed in section `.dynamic'
  s390-linux-ld: warning: orphan section `.hash' from `arch/s390/kernel/head64.o' being placed in section `.hash'
  s390-linux-ld: warning: orphan section `.gnu.hash' from `arch/s390/kernel/head64.o' being placed in section `.gnu.hash'

Explicitly keep those sections like other architectures when
CONFIG_RELOCATABLE is enabled, which is always true for s390.

[hca@linux.ibm.com: keep sections instead of discarding]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-4-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Nathan Chancellor
30226853d6 s390: vmlinux.lds.S: explicitly handle '.got' and '.plt' sections
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are a lot of warnings around the
GOT and PLT sections:

  s390-linux-ld: warning: orphan section `.plt' from `arch/s390/kernel/head64.o' being placed in section `.plt'
  s390-linux-ld: warning: orphan section `.got' from `arch/s390/kernel/head64.o' being placed in section `.got'
  s390-linux-ld: warning: orphan section `.got.plt' from `arch/s390/kernel/head64.o' being placed in section `.got.plt'
  s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/kernel/head64.o' being placed in section `.iplt'
  s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/kernel/head64.o' being placed in section `.igot.plt'

  s390-linux-ld: warning: orphan section `.iplt' from `arch/s390/boot/head.o' being placed in section `.iplt'
  s390-linux-ld: warning: orphan section `.igot.plt' from `arch/s390/boot/head.o' being placed in section `.igot.plt'
  s390-linux-ld: warning: orphan section `.got' from `arch/s390/boot/head.o' being placed in section `.got'

Currently, only the '.got' section is actually emitted in the final
binary. In a manner similar to other architectures, put the '.got'
section near the '.data' section and coalesce the PLT sections,
checking that the final section is zero sized, which is a safe/tested
approach versus full discard.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-3-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Nathan Chancellor
bdf2cd27a3 s390: vmlinux.lds.S: handle '.data.rel' sections explicitly
When building with CONFIG_LD_ORPHAN_WARN after selecting
CONFIG_ARCH_HAS_LD_ORPHAN_WARN, there are a lot of warnings around
'.data.rel' sections:

  s390-linux-ld: warning: orphan section `.data.rel' from `kernel/sched/build_utility.o' being placed in section `.data.rel'
  s390-linux-ld: warning: orphan section `.data.rel.local' from `kernel/sched/build_utility.o' being placed in section `.data.rel.local'
  s390-linux-ld: warning: orphan section `.data.rel.ro' from `kernel/sched/build_utility.o' being placed in section `.data.rel.ro'
  s390-linux-ld: warning: orphan section `.data.rel.ro.local' from `kernel/sched/build_utility.o' being placed in section `.data.rel.ro.local'

Describe these in vmlinux.lds.S so there is no more warning and the
sections are placed consistently between linkers.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20240207-s390-lld-and-orphan-warn-v1-2-8a665b3346ab@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-14 13:50:52 +01:00
Heiko Carstens
340750c13c s390/switch_to: use generic header file
Move the switch_to() implementation to process.c and use the generic
switch_to.h header file instead, like some other architectures.

This addresses also the oddity that the old switch_to() implementation
assigns the return value of __switch_to() to 'prev' instead of 'last',
like it should.

Remove also all includes of switch_to.h from C files, except process.c.

Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-12 15:03:33 +01:00
Heiko Carstens
304103736b s390/acrs: cleanup access register handling
save_access_regs() and restore_access_regs() are only available by
including switch_to.h. This is done by a couple of C files, which have
nothing to do with switch_to(), but only need these functions.

Move both functions to a new header file and improve the implementation:

- Get rid of typedef

- Add memory access instrumentation support

- Use long displacement instructions lamy/stamy instead of lam/stam - all
  current users end up with better code because of this

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-12 15:03:33 +01:00
Heiko Carstens
8d16ce1488 s390/fpu: make use of __uninitialized macro
Code sections in s390 specific kernel code which use floating point or
vector registers all come with a 520 byte stack variable to save already in
use registers, if required.

With INIT_STACK_ALL_PATTERN or INIT_STACK_ALL_ZERO enabled this variable
will always be initialized on function entry in addition to saving register
contents, which contradicts the intention (performance improvement) of such
code sections.

Therefore provide a DECLARE_KERNEL_FPU_ONSTACK() macro which provides
struct kernel_fpu variables with an __uninitialized attribute, and convert
all existing code to use this.

This way only this specific type of stack variable will not be initialized,
regardless of config options.

Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20240205154844.3757121-3-hca@linux.ibm.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:16 +01:00
Ricardo B. Marliere
6695d79224 s390/time: make stp_subsys const
Now that the driver core can properly handle constant struct bus_type,
move the stp_subsys variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: "Ricardo B. Marliere" <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240204-bus_cleanup-s390_time-v1-1-d2120156982a@marliere.net
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:15 +01:00
Nathan Chancellor
0628c03934 s390/vdso: drop '-fPIC' from LDFLAGS
'-fPIC' as an option to the linker does not do what it seems like it
should. With ld.bfd, it is treated as '-f PIC', which does not make
sense based on the meaning of '-f':

  -f SHLIB, --auxiliary SHLIB Auxiliary filter for shared object symbol table

When building with ld.lld (currently under review in a GitHub pull
request), it just errors out because '-f' means nothing and neither does
'-fPIC':

  ld.lld: error: unknown argument '-fPIC'

'-fPIC' was blindly copied from CFLAGS when the vDSO stopped being
linked with '$(CC)', it should not be needed. Remove it to clear up the
build failure with ld.lld.

Fixes: 2b2a25845d ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO")
Link: https://github.com/llvm/llvm-project/pull/75643
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20240130-s390-vdso-drop-fpic-from-ldflags-v1-1-094ad104fc55@kernel.org
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:15 +01:00
Thomas Richter
eec561024b s390/diag: add missing virt_to_phys() translation to diag14()
diag14() is currently only used by the vmur device driver.  The third
parameter, called subcommand, determines the type of the first
parameter. For some subcommands the value of the first parameter is an
address to a memory buffer and needs virtual to physical address
conversion. Other subcommands interpret the first parameter is an
integer.
This doesn't fix a bug since virtual and physical addresses
are currently the same.

Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:14 +01:00
Sven Schnelle
fc17e992e1 s390/time: improve steering precision
The common timekeeping code steers the clock by adjusting the multiplier
value of the clock. With the current value of 1000 precision is lost
when the clock is steered with a userspace daemon. Increase the multiplier
and the shift values to increase precision.

Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:14 +01:00
Thomas Richter
5d8cc70c36 s390/pai_crypto: return proper error code in paicrypt_init
paicrypt_init() return incorrect error code in case the number
of PAI crypto counters is too high. Change the return code to
-E2BIG.

Please merge with d0b0efedc7fe

Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:14 +01:00
Thomas Richter
d414f4ecb2 s390/pai: export number of sysfs attribute files
The number of sysfs files to be exported by the PAI device drivers
depends on the hardware version level. Use the value returned by
the hardware as the maximum number of counters to be exported
in the sysfs attribute tree.

Without the fix, older machine generation export counter names
based on paiXXXX_ctrnames static array info, which can be inaccurate,
as this array could also contain newer counter names in the
future.  This ensures proper pai counter sysfs attributes for
both newer generation and older generation processors.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:14 +01:00
Thomas Richter
3a5da4670d s390/pai_crypto: emit error on too many counters
When the device driver is initialized, it checks the number of
possible counters. Should this number be too high, emit an error
and return.

Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:13 +01:00
Thomas Richter
225d09d6e5 s390/pai: fix attr_event_free upper limit for pai device drivers
When the device drivers are initialized, a sysfs directory
is created. This contains many attributes which are allocated with
kzalloc(). Should it fail, the memory for the attributes already
created is freed in attr_event_free(). Its second parameter is number
of attribute elements to delete. This parameter is off by one.
When i. e. the 10th attribute fails to get created, attributes
numbered 0 to 9 should be deleted. Currently only attributes
numbered 0 to 8 are deleted.

Fixes: 39d62336f5 ("s390/pai: add support for cryptography counters")
Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:13 +01:00
Heiko Carstens
e98eda926b s390/hypfs_diag0c: fix virtual vs physical address confusion
Add missing virt_to_phys() translation to diag0c(). This doesn't fix a
bug since virtual and physical addresses are currently the same.

Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:13 +01:00
Alexander Gordeev
d8132003f8 s390/diag: fix diag26c() physical vs virtual address confusion
Fix virtual vs physical address confusion (which currently are the same).

Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-09 13:58:13 +01:00
Linus Torvalds
302d185865 s390 updates for 6.8 merge window part 2
- do not enable by default the support of 31-bit Enterprise Systems
   Architecture (ESA) ELF binaries
 
 - drop automatic CONFIG_KEXEC selection, while set CONFIG_KEXEC=y
   explicitly for defconfig and debug_defconfig only
 
 - fix zpci_get_max_io_size() to allow PCI block stores where
   normal PCI stores were used otherwise
 
 - remove unneeded tsk variable in do_exception() fault handler
 
 - __load_fpu_regs() is only called from the core kernel code.
   Therefore, remove not needed EXPORT_SYMBOL.
 
 - remove leftover comment from s390_fpregs_set() callback
 
 - few cleanups to Processor Activity Instrumentation (PAI) code
   (which perf framework is based on)
 
 - replace Wenjia Zhang with Thorsten Winkler as s390 Inter-User
   Communication Vehicle (IUCV) networking maintainer
 
 - Fix all scenarios where queues previously removed from a guest's
   Adjunct-Processor (AP) configuration do not re-appear in a reset
   state when they are subsequently made available to a guest again
 -----BEGIN PGP SIGNATURE-----
 
 iI0EABYIADUWIQQrtrZiYVkVzKQcYivNdxKlNrRb8AUCZalUTxccYWdvcmRlZXZA
 bGludXguaWJtLmNvbQAKCRDNdxKlNrRb8OZlAQDq7gZEL/ZSh1X4IQa4qPbdGeko
 ArZvcOi3SiVW3jw3FgD8CfPauKux1tliSpeRybcRkUVQoPGvoWc8kwAkCm/Juwc=
 =+Xd9
 -----END PGP SIGNATURE-----

Merge tag 's390-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull more s390 updates from Alexander Gordeev:

 - do not enable by default the support of 31-bit Enterprise Systems
   Architecture (ESA) ELF binaries

 - drop automatic CONFIG_KEXEC selection, while set CONFIG_KEXEC=y
   explicitly for defconfig and debug_defconfig only

 - fix zpci_get_max_io_size() to allow PCI block stores where normal PCI
   stores were used otherwise

 - remove unneeded tsk variable in do_exception() fault handler

 - __load_fpu_regs() is only called from the core kernel code.
   Therefore, remove not needed EXPORT_SYMBOL.

 - remove leftover comment from s390_fpregs_set() callback

 - few cleanups to Processor Activity Instrumentation (PAI) code (which
   perf framework is based on)

 - replace Wenjia Zhang with Thorsten Winkler as s390 Inter-User
   Communication Vehicle (IUCV) networking maintainer

 - Fix all scenarios where queues previously removed from a guest's
   Adjunct-Processor (AP) configuration do not re-appear in a reset
   state when they are subsequently made available to a guest again

* tag 's390-6.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vfio-ap: do not reset queue removed from host config
  s390/vfio-ap: reset queues associated with adapter for queue unbound from driver
  s390/vfio-ap: reset queues filtered from the guest's AP config
  s390/vfio-ap: let on_scan_complete() callback filter matrix and update guest's APCB
  s390/vfio-ap: loop over the shadow APCB when filtering guest's AP configuration
  s390/vfio-ap: always filter entire AP matrix
  s390/net: add Thorsten Winkler as maintainer
  s390/pai_ext: split function paiext_push_sample
  s390/pai_ext: rework function paiext_copy argments
  s390/pai: rework paiXXX_start and paiXXX_stop functions
  s390/pai_crypto: split function paicrypt_push_sample
  s390/pai: rework paixxxx_getctr interface
  s390/ptrace: remove leftover comment
  s390/fpu: remove __load_fpu_regs() export
  s390/mm,fault: remove not needed tsk variable
  s390/pci: fix max size calculation in zpci_memcpy_toio()
  s390/kexec: do not automatically select KEXEC option
  s390/compat: change default for CONFIG_COMPAT to "n"
2024-01-18 14:11:25 -08:00
Linus Torvalds
09d1c6a80f Generic:
- Use memdup_array_user() to harden against overflow.
 
 - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all architectures.
 
 - Clean up Kconfigs that all KVM architectures were selecting
 
 - New functionality around "guest_memfd", a new userspace API that
   creates an anonymous file and returns a file descriptor that refers
   to it.  guest_memfd files are bound to their owning virtual machine,
   cannot be mapped, read, or written by userspace, and cannot be resized.
   guest_memfd files do however support PUNCH_HOLE, which can be used to
   switch a memory area between guest_memfd and regular anonymous memory.
 
 - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
   per-page attributes for a given page of guest memory; right now the
   only attribute is whether the guest expects to access memory via
   guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
   TDX or ARM64 pKVM is checked by firmware or hypervisor that guarantees
   confidentiality (AMD PSP, Intel TDX module, or EL2 in the case of pKVM).
 
 x86:
 
 - Support for "software-protected VMs" that can use the new guest_memfd
   and page attributes infrastructure.  This is mostly useful for testing,
   since there is no pKVM-like infrastructure to provide a meaningfully
   reduced TCB.
 
 - Fix a relatively benign off-by-one error when splitting huge pages during
   CLEAR_DIRTY_LOG.
 
 - Fix a bug where KVM could incorrectly test-and-clear dirty bits in non-leaf
   TDP MMU SPTEs if a racing thread replaces a huge SPTE with a non-huge SPTE.
 
 - Use more generic lockdep assertions in paths that don't actually care
   about whether the caller is a reader or a writer.
 
 - let Xen guests opt out of having PV clock reported as "based on a stable TSC",
   because some of them don't expect the "TSC stable" bit (added to the pvclock
   ABI by KVM, but never set by Xen) to be set.
 
 - Revert a bogus, made-up nested SVM consistency check for TLB_CONTROL.
 
 - Advertise flush-by-ASID support for nSVM unconditionally, as KVM always
   flushes on nested transitions, i.e. always satisfies flush requests.  This
   allows running bleeding edge versions of VMware Workstation on top of KVM.
 
 - Sanity check that the CPU supports flush-by-ASID when enabling SEV support.
 
 - On AMD machines with vNMI, always rely on hardware instead of intercepting
   IRET in some cases to detect unmasking of NMIs
 
 - Support for virtualizing Linear Address Masking (LAM)
 
 - Fix a variety of vPMU bugs where KVM fail to stop/reset counters and other state
   prior to refreshing the vPMU model.
 
 - Fix a double-overflow PMU bug by tracking emulated counter events using a
   dedicated field instead of snapshotting the "previous" counter.  If the
   hardware PMC count triggers overflow that is recognized in the same VM-Exit
   that KVM manually bumps an event count, KVM would pend PMIs for both the
   hardware-triggered overflow and for KVM-triggered overflow.
 
 - Turn off KVM_WERROR by default for all configs so that it's not
   inadvertantly enabled by non-KVM developers, which can be problematic for
   subsystems that require no regressions for W=1 builds.
 
 - Advertise all of the host-supported CPUID bits that enumerate IA32_SPEC_CTRL
   "features".
 
 - Don't force a masterclock update when a vCPU synchronizes to the current TSC
   generation, as updating the masterclock can cause kvmclock's time to "jump"
   unexpectedly, e.g. when userspace hotplugs a pre-created vCPU.
 
 - Use RIP-relative address to read kvm_rebooting in the VM-Enter fault paths,
   partly as a super minor optimization, but mostly to make KVM play nice with
   position independent executable builds.
 
 - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
   CONFIG_HYPERV as a minor optimization, and to self-document the code.
 
 - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV "emulation"
   at build time.
 
 ARM64:
 
 - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB
   base granule sizes. Branch shared with the arm64 tree.
 
 - Large Fine-Grained Trap rework, bringing some sanity to the
   feature, although there is more to come. This comes with
   a prefix branch shared with the arm64 tree.
 
 - Some additional Nested Virtualization groundwork, mostly
   introducing the NV2 VNCR support and retargetting the NV
   support to that version of the architecture.
 
 - A small set of vgic fixes and associated cleanups.
 
 Loongarch:
 
 - Optimization for memslot hugepage checking
 
 - Cleanup and fix some HW/SW timer issues
 
 - Add LSX/LASX (128bit/256bit SIMD) support
 
 RISC-V:
 
 - KVM_GET_REG_LIST improvement for vector registers
 
 - Generate ISA extension reg_list using macros in get-reg-list selftest
 
 - Support for reporting steal time along with selftest
 
 s390:
 
 - Bugfixes
 
 Selftests:
 
 - Fix an annoying goof where the NX hugepage test prints out garbage
   instead of the magic token needed to run the test.
 
 - Fix build errors when a header is delete/moved due to a missing flag
   in the Makefile.
 
 - Detect if KVM bugged/killed a selftest's VM and print out a helpful
   message instead of complaining that a random ioctl() failed.
 
 - Annotate the guest printf/assert helpers with __printf(), and fix the
   various bugs that were lurking due to lack of said annotation.
 
 There are two non-KVM patches buried in the middle of guest_memfd support:
 
   fs: Rename anon_inode_getfile_secure() and anon_inode_getfd_secure()
   mm: Add AS_UNMOVABLE to mark mapping as completely unmovable
 
 The first is small and mostly suggested-by Christian Brauner; the second
 a bit less so but it was written by an mm person (Vlastimil Babka).
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmWcMWkUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroO15gf/WLmmg3SET6Uzw9iEq2xo28831ZA+
 6kpILfIDGKozV5safDmMvcInlc/PTnqOFrsKyyN4kDZ+rIJiafJdg/loE0kPXBML
 wdR+2ix5kYI1FucCDaGTahskBDz8Lb/xTpwGg9BFLYFNmuUeHc74o6GoNvr1uliE
 4kLZL2K6w0cSMPybUD+HqGaET80ZqPwecv+s1JL+Ia0kYZJONJifoHnvOUJ7DpEi
 rgudVdgzt3EPjG0y1z6MjvDBXTCOLDjXajErlYuZD3Ej8N8s59Dh2TxOiDNTLdP4
 a4zjRvDmgyr6H6sz+upvwc7f4M4p+DBvf+TkWF54mbeObHUYliStqURIoA==
 =66Ws
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "Generic:

   - Use memdup_array_user() to harden against overflow.

   - Unconditionally advertise KVM_CAP_DEVICE_CTRL for all
     architectures.

   - Clean up Kconfigs that all KVM architectures were selecting

   - New functionality around "guest_memfd", a new userspace API that
     creates an anonymous file and returns a file descriptor that refers
     to it. guest_memfd files are bound to their owning virtual machine,
     cannot be mapped, read, or written by userspace, and cannot be
     resized. guest_memfd files do however support PUNCH_HOLE, which can
     be used to switch a memory area between guest_memfd and regular
     anonymous memory.

   - New ioctl KVM_SET_MEMORY_ATTRIBUTES allowing userspace to specify
     per-page attributes for a given page of guest memory; right now the
     only attribute is whether the guest expects to access memory via
     guest_memfd or not, which in Confidential SVMs backed by SEV-SNP,
     TDX or ARM64 pKVM is checked by firmware or hypervisor that
     guarantees confidentiality (AMD PSP, Intel TDX module, or EL2 in
     the case of pKVM).

  x86:

   - Support for "software-protected VMs" that can use the new
     guest_memfd and page attributes infrastructure. This is mostly
     useful for testing, since there is no pKVM-like infrastructure to
     provide a meaningfully reduced TCB.

   - Fix a relatively benign off-by-one error when splitting huge pages
     during CLEAR_DIRTY_LOG.

   - Fix a bug where KVM could incorrectly test-and-clear dirty bits in
     non-leaf TDP MMU SPTEs if a racing thread replaces a huge SPTE with
     a non-huge SPTE.

   - Use more generic lockdep assertions in paths that don't actually
     care about whether the caller is a reader or a writer.

   - let Xen guests opt out of having PV clock reported as "based on a
     stable TSC", because some of them don't expect the "TSC stable" bit
     (added to the pvclock ABI by KVM, but never set by Xen) to be set.

   - Revert a bogus, made-up nested SVM consistency check for
     TLB_CONTROL.

   - Advertise flush-by-ASID support for nSVM unconditionally, as KVM
     always flushes on nested transitions, i.e. always satisfies flush
     requests. This allows running bleeding edge versions of VMware
     Workstation on top of KVM.

   - Sanity check that the CPU supports flush-by-ASID when enabling SEV
     support.

   - On AMD machines with vNMI, always rely on hardware instead of
     intercepting IRET in some cases to detect unmasking of NMIs

   - Support for virtualizing Linear Address Masking (LAM)

   - Fix a variety of vPMU bugs where KVM fail to stop/reset counters
     and other state prior to refreshing the vPMU model.

   - Fix a double-overflow PMU bug by tracking emulated counter events
     using a dedicated field instead of snapshotting the "previous"
     counter. If the hardware PMC count triggers overflow that is
     recognized in the same VM-Exit that KVM manually bumps an event
     count, KVM would pend PMIs for both the hardware-triggered overflow
     and for KVM-triggered overflow.

   - Turn off KVM_WERROR by default for all configs so that it's not
     inadvertantly enabled by non-KVM developers, which can be
     problematic for subsystems that require no regressions for W=1
     builds.

   - Advertise all of the host-supported CPUID bits that enumerate
     IA32_SPEC_CTRL "features".

   - Don't force a masterclock update when a vCPU synchronizes to the
     current TSC generation, as updating the masterclock can cause
     kvmclock's time to "jump" unexpectedly, e.g. when userspace
     hotplugs a pre-created vCPU.

   - Use RIP-relative address to read kvm_rebooting in the VM-Enter
     fault paths, partly as a super minor optimization, but mostly to
     make KVM play nice with position independent executable builds.

   - Guard KVM-on-HyperV's range-based TLB flush hooks with an #ifdef on
     CONFIG_HYPERV as a minor optimization, and to self-document the
     code.

   - Add CONFIG_KVM_HYPERV to allow disabling KVM support for HyperV
     "emulation" at build time.

  ARM64:

   - LPA2 support, adding 52bit IPA/PA capability for 4kB and 16kB base
     granule sizes. Branch shared with the arm64 tree.

   - Large Fine-Grained Trap rework, bringing some sanity to the
     feature, although there is more to come. This comes with a prefix
     branch shared with the arm64 tree.

   - Some additional Nested Virtualization groundwork, mostly
     introducing the NV2 VNCR support and retargetting the NV support to
     that version of the architecture.

   - A small set of vgic fixes and associated cleanups.

  Loongarch:

   - Optimization for memslot hugepage checking

   - Cleanup and fix some HW/SW timer issues

   - Add LSX/LASX (128bit/256bit SIMD) support

  RISC-V:

   - KVM_GET_REG_LIST improvement for vector registers

   - Generate ISA extension reg_list using macros in get-reg-list
     selftest

   - Support for reporting steal time along with selftest

  s390:

   - Bugfixes

  Selftests:

   - Fix an annoying goof where the NX hugepage test prints out garbage
     instead of the magic token needed to run the test.

   - Fix build errors when a header is delete/moved due to a missing
     flag in the Makefile.

   - Detect if KVM bugged/killed a selftest's VM and print out a helpful
     message instead of complaining that a random ioctl() failed.

   - Annotate the guest printf/assert helpers with __printf(), and fix
     the various bugs that were lurking due to lack of said annotation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (185 commits)
  x86/kvm: Do not try to disable kvmclock if it was not enabled
  KVM: x86: add missing "depends on KVM"
  KVM: fix direction of dependency on MMU notifiers
  KVM: introduce CONFIG_KVM_COMMON
  KVM: arm64: Add missing memory barriers when switching to pKVM's hyp pgd
  KVM: arm64: vgic-its: Avoid potential UAF in LPI translation cache
  RISC-V: KVM: selftests: Add get-reg-list test for STA registers
  RISC-V: KVM: selftests: Add steal_time test support
  RISC-V: KVM: selftests: Add guest_sbi_probe_extension
  RISC-V: KVM: selftests: Move sbi_ecall to processor.c
  RISC-V: KVM: Implement SBI STA extension
  RISC-V: KVM: Add support for SBI STA registers
  RISC-V: KVM: Add support for SBI extension registers
  RISC-V: KVM: Add SBI STA info to vcpu_arch
  RISC-V: KVM: Add steal-update vcpu request
  RISC-V: KVM: Add SBI STA extension skeleton
  RISC-V: paravirt: Implement steal-time support
  RISC-V: Add SBI STA extension definitions
  RISC-V: paravirt: Add skeleton for pv-time support
  RISC-V: KVM: Fix indentation in kvm_riscv_vcpu_set_reg_csr()
  ...
2024-01-17 13:03:37 -08:00
Thomas Richter
3046a10911 s390/pai_ext: split function paiext_push_sample
Split function paiext_push_sample() into two parts. The first part
determines the number of bytes to store as raw data in the perf sample
record. This is now function paiext_have_sample().
The second part stores the raw data in the perf event's ring buffer.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-12 14:23:27 +01:00
Thomas Richter
0dade41d16 s390/pai_ext: rework function paiext_copy argments
Change the function paiext_copy() parameter from a pointer to a
structure to two pointers to memory areas referenced. The other
members of that structure are not needed.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-12 14:23:26 +01:00
Thomas Richter
cb1259b7b5 s390/pai: rework paiXXX_start and paiXXX_stop functions
The PAI crypto counter and PAI NNPA counters start and stop functions
are streamlined. Move the conditions to invoke start and stop functions
to its respective function body and call them unconditionally.
The start and stop functions now determine how to proceed.
No functional change.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Mete Durlu <meted@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-01-12 14:23:26 +01:00