- Fix preempt_count initialization.
- Rework call_on_stack() macro to add proper type handling and avoid
possible register corruption.
- More error prone "register asm" removal and fixes.
- Fix syscall restarting when multiple signals are coming in. This adds
minimalistic trampolines to vdso so we can return from signal without
using the stack which requires pgm check handler hacks when NX is
enabled.
- Remove HAVE_IRQ_EXIT_ON_IRQ_STACK since this is no longer true after
switch to generic entry.
- Fix protected virtualization secure storage access exception handling.
- Make machine check C handler always enter with DAT enabled and move
register validation to C code.
- Fix tinyconfig boot problem by avoiding MONITOR CALL without CONFIG_BUG.
- Increase asm symbols alignment to 16 to make it consistent with
compilers.
- Enable concurrent access to the CPU Measurement Counter Facility.
- Add support for dynamic AP bus size limit and rework ap_dqap to deal
with messages greater than recv buffer.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmDpo78ACgkQjYWKoQLX
FBhapAf/UkwhYLTrEEOSmrWQaJHyOkdxcexksBPvh/GEQvDkHixHROeytRb5SBMD
JXGJsWN3g9DjiE9DLFGSK2eQfKYFoPaXWv3Tf+QIPvEwvODKZHRLVJU1VTXiF4EA
+Xu4Q0o1VtX8ashEN0IHjBycPY9l+v9Ncb2erKG+CzoRa1r5C+A+1E0sLtsNN9TW
AV+57ATCc0pgNkUfzVoG2S/QdSY+fpMWtjoPeCj/DNCpFq1eRTMRGolTsdnqGpDx
ueAozHW9Pc9TQXeWpOVM7ZgWRWJThCC4vj2W/B7ygCuyJh4N0y2rthFEg3tWUf0f
+X6KDXeijFs4Gf0ewYLZjZCpBT8/NA==
=EgkG
-----END PGP SIGNATURE-----
Merge tag 's390-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Vasily Gorbik:
- Fix preempt_count initialization.
- Rework call_on_stack() macro to add proper type handling and avoid
possible register corruption.
- More error prone "register asm" removal and fixes.
- Fix syscall restarting when multiple signals are coming in. This adds
minimalistic trampolines to vdso so we can return from signal without
using the stack which requires pgm check handler hacks when NX is
enabled.
- Remove HAVE_IRQ_EXIT_ON_IRQ_STACK since this is no longer true after
switch to generic entry.
- Fix protected virtualization secure storage access exception
handling.
- Make machine check C handler always enter with DAT enabled and move
register validation to C code.
- Fix tinyconfig boot problem by avoiding MONITOR CALL without
CONFIG_BUG.
- Increase asm symbols alignment to 16 to make it consistent with
compilers.
- Enable concurrent access to the CPU Measurement Counter Facility.
- Add support for dynamic AP bus size limit and rework ap_dqap to deal
with messages greater than recv buffer.
* tag 's390-5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (41 commits)
s390: preempt: Fix preempt_count initialization
s390/linkage: increase asm symbols alignment to 16
s390: rename CALL_ON_STACK_NORETURN() to call_on_stack_noreturn()
s390: add type checking to CALL_ON_STACK_NORETURN() macro
s390: remove old CALL_ON_STACK() macro
s390/softirq: use call_on_stack() macro
s390/lib: use call_on_stack() macro
s390/smp: use call_on_stack() macro
s390/kexec: use call_on_stack() macro
s390/irq: use call_on_stack() macro
s390/mm: use call_on_stack() macro
s390: introduce proper type handling call_on_stack() macro
s390/irq: simplify on_async_stack()
s390/irq: inline do_softirq_own_stack()
s390/irq: simplify do_softirq_own_stack()
s390/ap: get rid of register asm in ap_dqap()
s390: rename PIF_SYSCALL_RESTART to PIF_EXECVE_PGSTE_RESTART
s390: move restart of execve() syscall
s390/signal: remove sigreturn on stack
s390/signal: switch to using vdso for sigreturn and syscall restart
...
S390's init_idle_preempt_count(p, cpu) doesn't actually let us initialize the
preempt_count of the requested CPU's idle task: it unconditionally writes
to the current CPU's. This clearly conflicts with idle_threads_init(),
which intends to initialize *all* the idle tasks, including their
preempt_count (or their CPU's, if the arch uses a per-CPU preempt_count).
Unfortunately, it seems the way s390 does things doesn't let us initialize
every possible CPU's preempt_count early on, as the pages where this
resides are only allocated when a CPU is brought up and are freed when it
is brought down.
Let the arch-specific code set a CPU's preempt_count when its lowcore is
allocated, and turn init_idle_preempt_count() into an empty stub.
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Link: https://lore.kernel.org/r/20210707163338.1623014-1-valentin.schneider@arm.com
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Lower case matches the call_on_stack() macro and is easier to read.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Make on_async_stack() a bit more readable, even though as usual it
depends if one considers "!!!" readable or not.
At least the new construct to check if the async stack is in use or
not is a bit shorter and generates slightly better code.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Move do_softirq_own_stack() to proper header file so it can be
inlined; saving a few cycles.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
do_softirq_own_stack() is always called from task context and
therefore it is not necessary to check if the async stack is
currently used.
Remove the check and directly switch to async stack.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
PIF_SYSCALL_RESTART is now only used to restart execve when loading
PGSTE binaries. Rename the flag to reflect that, and avoid people
thinking that this bit has anything to do with generic syscall
restarting.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
On s390, execve might have to be restarted for PGSTE binaries
like kvm. In the past this was done via the PIF_SYSCALL_RESTART
bit. However, with the recent changes, syscalls are now restarted
differently. Now that execve() is the only call that might get
restarted via PIF_SYSCALL_RESTART, move the loop to do_syscall().
This also has the advantage that the restart is no longer visible
to userspace.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
{rt_}sigreturn is now called from the vdso, so we no longer
need the svc on the stack, and therefore no hack to support that
mechanism on machines with non-executable stack.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
with generic entry, there's a bug when it comes to restarting of signals.
The failing sequence is:
a) a signal is coming in, and no handler is registered, so the lower
part of arch_do_signal_or_restart() in arch/s390/kernel/signal.c
sets PIF_SYSCALL_RESTART.
b) a second signal gets pending while the kernel is still in the exit
loop, and for that one, a handler exists.
c) The first part of arch_do_signal_or_restart() is called. That part
calls handle_signal(), which sets up stack + registers for handling
the signal.
d) __do_syscall() in arch/s390/kernel/syscall.c checks for
PIF_SYSCALL_RESTART right before leaving to userspace. If it is set,
it restart's the syscall. However, the registers are already setup
for handling a signal from c). The syscall is now restarted with the
wrong arguments.
Change the code to:
- use vdso for syscall_restart() instead of PIF_SYSCALL_RESTART because
we cannot rewind and go back to userspace on s390 because the system call
number might be encoded in the svc instruction.
- for all other syscalls we rewind the PSW and return to userspace.
Cc: <stable@kernel.org> # v5.12+ d57778feb9: s390/vdso: always enable vdso
Cc: <stable@kernel.org> # v5.12+ 686341f254: s390/vdso64: add sigreturn,rt_sigreturn and restart_syscall
Cc: <stable@kernel.org> # v5.12+ 43e1f76b0b: s390/vdso: rename VDSO64_LBASE to VDSO_LBASE
Cc: <stable@kernel.org> # v5.12+ 779df22487: s390/vdso: add minimal compat vdso
Cc: <stable@kernel.org> # v5.12+
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add a small vdso for 31 bit compat application that provides
trampolines for calls to sigreturn,rt_sigreturn,syscall_restart.
This is requird for moving these syscalls away from the signal
frame to the vdso. Note that this patch effectively disables
CONFIG_COMPAT when using clang to compile the kernel. clang
doesn't support 31 bit mode.
We want to redirect sigreturn and restart_syscall to the vdso. However,
the kernel cannot parse the ELF vdso file, so we need to generate header
files which contain the offsets of the syscall instructions in the vdso
page.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Will be used by both vdso32 and vdso64, so change the name.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add minimalistic trampolines to vdso64 so we can return from signal
without using the stack which requires pgm check handler hacks when
NX is enabled.
restart_syscall will be called from vdso to work around the architectural
limitation that the syscall number might be encoded in the svc instruction,
and therefore can not be changed.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
With the upcoming move of the svc sigreturn instruction from
the signal frame to vdso we need to have vdso always enabled.
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
tinyconfig fails to boot, because without CONFIG_BUG report_bug()
always returns BUG_TRAP_TYPE_BUG, which causes mc 0,0 in
test_monitor_call() to panic. Fix by skipping the test without
CONFIG_BUG.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit cf6acb8bdb ("s390/cpumf: Add support for complete counter set extraction")
allows access to the CPU Measurement Counter Facility via character
device /dev/hwctr. The access was exclusive via this device or
via perf_event_open() system call. Only one path at a time was
permitted. The CPU Measurement Counter Facility device driver blocked
access to other processes.
This patch removes this restriction and allows concurrent access to
the CPU Measurement Counter Facility from multiple processes at the same
time via perf_event_open() SVC and via /dev/hwctr device. The access
via /dev/hwctr device is still exclusive, only one process is allowed to
access this device.
This patch
- moves the /dev/hwctr device access from file perf_cpum_cf_diag.c.
to file perf_cpum_cf.c.
- use only one trace buffer .../s390dbf/cpum_cf.
- remove cfset_csd structure and includes its members it into the
structure cpu_cf_events. This results in one data structure and
simplifies the access.
- rework function familiy ctr_set_enable, ctr_set_disable, ctr_set_start
and ctr_set_stop which operate on a counter set number.
Now they operate on a counter set bit mask.
- move CF_DIAG event functionality to file perf_cpum_cf.c. It now
contains the complete functionality of the CPU Measurement Counter
Facility:
- Performance measurement support for counters using perf stat.
- Support for complete counter set extraction with device /dev/hwctr.
- Support for counter set extraction event CF_DIAG attached to
samples using perf record.
- removes file perf_cpum_cf_diag.c
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
This update partially reverts commit 3037a52f98 ("s390/nmi:
do register validation as early as possible").
Storage error checks and control registers validation are left
in the assembler code, since correct ASCEs and page tables are
required to enable DAT - which is done before the C handler is
entered.
System damage, kernel instruction address and PSW MWP checks
are left in the assembler code as well, since there is no way
to proceed if one of these checks is failed.
The getcpu vdso syscall reads CPU number from the programmable
field of the TOD clock. Disregard the TOD programmable register
validity bit and load the CPU number into the TOD programmable
field unconditionally.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The magic string "S390EP" at offset 0x10008 indicated to the decompressed
kernel that it was booted by the decompressor. Introduce a new bootdata
flag instead which conveys the same information in an explicit and
a cleaner way. But keep the magic string because it is a kernel ABI.
Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reviewed-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The current storage errors tackling is wrong - the DAT is
enabled in assembler code before the actual storage checks
in C half are executed. In case the page tables themselves
are damaged such approach is not going to work.
With this update unrecoverable storage errors are not
passed to C code for handling, but rather the machine
is stopped right away. The only exception to this flow
is when a machine check occurred in KVM guest - in this
case the errors are reinjected by the handler.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The machine check handler must be entered with DAT disabled
in case control registers are corrupted or a storage error
happened and we can not tell if such error corresponds to a
page table.
Both of described conditions end up in stopping all CPUs and
entering the disabled wait in C half of the handler. However,
the storage errors are still checked after the DAT is enabled
and C code is entered. In case a page table is damaged such
flow is not expected to work.
This update paves the way for moving the storage error checks
from C to assembler half. All fatal errors that can only be
handled with DAT disabled are handled in assembler half also.
As result, the C half is only entered if the DAT is secured.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
In case of the !CONFIG_KVM use "jz" instead of "jnz" when
detecting user mode and get rid of unnecessary jump as result.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Christia Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Factor out SIEEXIT macro and use it instead of cleanup_sie
routine. As a side effect %r13 and %r14 are spared.
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Reviewed-by: Christia Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Turns out that the bit 61 in the TEID is not always 1 and if that's
the case the address space ID and the address are
unpredictable. Without an address and its address space ID we can't
export memory and hence we can only send a SIGSEGV to the process or
panic the kernel depending on who caused the exception.
Unfortunately bit 61 is only reliable if we have the "misc" UV feature
bit.
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 084ea4d611 ("s390/mm: add (non)secure page access exceptions handlers")
Cc: stable@vger.kernel.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Rework inline asm to get rid of error prone "register asm" constructs,
which are problematic especially when code instrumentation is enabled. In
particular introduce and use register pair union to allocate even/odd
register pairs. Unfortunately this breaks compatibility with older
clang compilers and minimum clang version for s390 has been raised to 13.
https://lore.kernel.org/linux-next/CAK7LNARuSmPCEy-ak0erPrPTgZdGVypBROFhtw+=3spoGoYsyw@mail.gmail.com/
- Fix gcc 11 warnings, which triggered various minor reworks all over
the code.
- Add zstd kernel image compression support.
- Rework boot CPU lowcore handling.
- De-duplicate and move kernel memory layout setup logic earlier.
- Few fixes in preparation for FORTIFY_SOURCE performing compile-time
and run-time field bounds checking for mem functions.
- Remove broken and unused power management support leftovers in s390
drivers.
- Disable stack-protector for decompressor and purgatory to fix buildroot
build.
- Fix vt220 sclp console name to match the char device name.
- Enable HAVE_IOREMAP_PROT and add zpci_set_irq()/zpci_clear_irq() in
zPCI code.
- Remove some implausible WARN_ON_ONCEs and remove arch specific counter
transaction call backs in favour of default transaction handling in
perf code.
- Extend/add new uevents for online/config/mode state changes of
AP card / queue device in zcrypt.
- Minor entry and ccwgroup code improvements.
- Other small various fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmDhuTEACgkQjYWKoQLX
FBjVlggAgDFBkDjlyfvrm4xzmHi7BJMmhrTJIONsSz+3tcA4/u5kE+Hrdrqxm0Uh
ZH4MXBxn4q4Fmoomhu5w5ZDe8o2ip0aN9fFNdsBoP8hurmQbL/IbdTnBETKMrKpV
XpogU2G7p+2nQ0+9+o6PS/vWlZhI88NVh8dWyRd2+5/XdMycgLv2Qm7NpQoACVw1
CbUvxP2PlpZ0wltLvNBKPg1xXMZa3GS0wbVUsS2jiWcr/3VzCqfTHenZJ/RadoE6
axG99QXCbLDMsJgVQcXtlI8K6Z461fAwbNtWZWC+Uq7o5pYuUFW1dovMg9WWF+7T
lFNqXyyNy5wwITRkvuzjlVTE8yzYYg==
=ADZ4
-----END PGP SIGNATURE-----
Merge tag 's390-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Rework inline asm to get rid of error prone "register asm"
constructs, which are problematic especially when code
instrumentation is enabled.
In particular introduce and use register pair union to allocate
even/odd register pairs. Unfortunately this breaks compatibility with
older clang compilers and minimum clang version for s390 has been
raised to 13.
https://lore.kernel.org/linux-next/CAK7LNARuSmPCEy-ak0erPrPTgZdGVypBROFhtw+=3spoGoYsyw@mail.gmail.com/
- Fix gcc 11 warnings, which triggered various minor reworks all over
the code.
- Add zstd kernel image compression support.
- Rework boot CPU lowcore handling.
- De-duplicate and move kernel memory layout setup logic earlier.
- Few fixes in preparation for FORTIFY_SOURCE performing compile-time
and run-time field bounds checking for mem functions.
- Remove broken and unused power management support leftovers in s390
drivers.
- Disable stack-protector for decompressor and purgatory to fix
buildroot build.
- Fix vt220 sclp console name to match the char device name.
- Enable HAVE_IOREMAP_PROT and add zpci_set_irq()/zpci_clear_irq() in
zPCI code.
- Remove some implausible WARN_ON_ONCEs and remove arch specific
counter transaction call backs in favour of default transaction
handling in perf code.
- Extend/add new uevents for online/config/mode state changes of AP
card / queue device in zcrypt.
- Minor entry and ccwgroup code improvements.
- Other small various fixes and improvements all over the code.
* tag 's390-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (91 commits)
s390/dasd: use register pair instead of register asm
s390/qdio: get rid of register asm
s390/ioasm: use symbolic names for asm operands
s390/ioasm: get rid of register asm
s390/cmf: get rid of register asm
s390/lib,string: get rid of register asm
s390/lib,uaccess: get rid of register asm
s390/string: get rid of register asm
s390/cmpxchg: use register pair instead of register asm
s390/mm,pages-states: get rid of register asm
s390/lib,xor: get rid of register asm
s390/timex: get rid of register asm
s390/hypfs: use register pair instead of register asm
s390/zcrypt: Switch to flexible array member
s390/speculation: Use statically initialized const for instructions
virtio/s390: get rid of open-coded kvm hypercall
s390/pci: add zpci_set_irq()/zpci_clear_irq()
scripts/min-tool-version.sh: Raise minimum clang version to 13.0.0 for s390
s390/ipl: use register pair instead of register asm
s390/mem_detect: fix tprot() program check new psw handling
...
Merge more updates from Andrew Morton:
"190 patches.
Subsystems affected by this patch series: mm (hugetlb, userfaultfd,
vmscan, kconfig, proc, z3fold, zbud, ras, mempolicy, memblock,
migration, thp, nommu, kconfig, madvise, memory-hotplug, zswap,
zsmalloc, zram, cleanups, kfence, and hmm), procfs, sysctl, misc,
core-kernel, lib, lz4, checkpatch, init, kprobes, nilfs2, hfs,
signals, exec, kcov, selftests, compress/decompress, and ipc"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (190 commits)
ipc/util.c: use binary search for max_idx
ipc/sem.c: use READ_ONCE()/WRITE_ONCE() for use_global_lock
ipc: use kmalloc for msg_queue and shmid_kernel
ipc sem: use kvmalloc for sem_undo allocation
lib/decompressors: remove set but not used variabled 'level'
selftests/vm/pkeys: exercise x86 XSAVE init state
selftests/vm/pkeys: refill shadow register after implicit kernel write
selftests/vm/pkeys: handle negative sys_pkey_alloc() return code
selftests/vm/pkeys: fix alloc_random_pkey() to make it really, really random
kcov: add __no_sanitize_coverage to fix noinstr for all architectures
exec: remove checks in __register_bimfmt()
x86: signal: don't do sas_ss_reset() until we are certain that sigframe won't be abandoned
hfsplus: report create_date to kstat.btime
hfsplus: remove unnecessary oom message
nilfs2: remove redundant continue statement in a while-loop
kprobes: remove duplicated strong free_insn_page in x86 and s390
init: print out unknown kernel parameters
checkpatch: do not complain about positive return values starting with EPOLL
checkpatch: improve the indented label test
checkpatch: scripts/spdxcheck.py now requires python3
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmDcl7AACgkQnJ2qBz9k
QNnsBQf+LBAPsfykQ/f8EdHErO1lfbVTmwf2g/JzTkjrIVZTZ6Ic47aCIiFxgHU2
Js9ufaPxpsbbopzpn2PAoCUzxNsZDqgXtnC03MOUAqoSFbAvgLHz2sQwjqeYJUGQ
P6n7VipEA/qBVpQI5zeCUhHYcahoNrRjSLzaFnE2Z8CrQYQ6Ry9gVEhduvu2OTru
62cWlAWlTJfx/FcR1Y0F/ZznnNSKMiAHcEe3F6Beztplg2ooq+z6FclJYrkmnxMq
SXSOsqTCdi1/oFx36NpvLkykrIS9I7N/iqCnKwbm6X+nyZZKyAwYZhWVqkbozPPu
+u1Ppq8o0IuWwEA6/UAmxgAO3m/Gkw==
=tn0h
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc fs updates from Jan Kara:
"The new quotactl_fd() syscall (remake of quotactl_path() syscall that
got introduced & disabled in 5.13 cycle), and couple of udf, reiserfs,
isofs, and writeback fixes and cleanups"
* tag 'fs_for_v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
writeback: fix obtain a reference to a freeing memcg css
quota: remove unnecessary oom message
isofs: remove redundant continue statement
quota: Wire up quotactl_fd syscall
quota: Change quotactl_path() systcall to an fd-based one
reiserfs: Remove unneed check in reiserfs_write_full_page()
udf: Fix NULL pointer dereference in udf_symlink function
reiserfs: add check for invalid 1st journal block
free_insn_page() in x86 and s390 is same with the common weak function in
kernel/kprobes.c. Plus, the comment "Recover page to RW mode before
releasing it" in x86 seems insensible to be there since resetting mapping
is done by common code in vfree() of module_memfree(). So drop these two
duplicated strong functions and related comment, then mark the common one
in kernel/kprobes.c strong.
Link: https://lkml.kernel.org/r/20210608065736.32656-1-song.bao.hua@hisilicon.com
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Qi Liu <liuqi115@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kernel.h is being used as a dump for all kinds of stuff for a long time.
Here is the attempt to start cleaning it up by splitting out panic and
oops helpers.
There are several purposes of doing this:
- dropping dependency in bug.h
- dropping a loop by moving out panic_notifier.h
- unload kernel.h from something which has its own domain
At the same time convert users tree-wide to use new headers, although for
the time being include new header back to kernel.h to avoid twisted
indirected includes for existing users.
[akpm@linux-foundation.org: thread_info.h needs limits.h]
[andriy.shevchenko@linux.intel.com: ia64 fix]
Link: https://lkml.kernel.org/r/20210520130557.55277-1-andriy.shevchenko@linux.intel.com
Link: https://lkml.kernel.org/r/20210511074137.33666-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Co-developed-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Wei Liu <wei.liu@kernel.org>
Acked-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sebastian Reichel <sre@kernel.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Stephen Boyd <sboyd@kernel.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow
the flexible utilization of SMT siblings, without exposing
untrusted domains to information leaks & side channels, plus
to ensure more deterministic computing performance on SMT
systems used by heterogenous workloads.
There's new prctls to set core scheduling groups, which
allows more flexible management of workloads that can share
siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve
'memcache'-like workloads.
- "Age" (decay) average idle time, to better track & improve workloads
such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked
via /sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable
it at runtime if tooling needs it. Use static keys and
other optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZcPoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g3yw//WfhIqy7Psa9d/MBMjQDRGbTuO4+w22Dj
vmWFU44Q4KJxQHWeIgUlrK+dzvYWvNmflUs2CUUOiDVzxFTHMIyBtL4qCBUbx4Ns
vKAcB9wsWZge2o3WzZqpProRhdoRaSKw8egUr2q7rACVBkckY7eGP/OjWxXU8BdA
b7D0LPWwuIBFfN4pFYeCDLn32Dqr9s6Chyj+ZecabdG7EE6Gu+f1diVcxy7JE/mc
4WWL0D1RqdgpGrBEuMJIxPYekdrZiuy4jtEbztz5gbTBteN1cj3BLfqn0Pc/e6rO
Vyuc5mXCAmzRVi18z6g6bsVl+IA/nrbErENB2OHOhOYtqiZxqGTd4GPWZszMyY17
5AsEO5+5pcaBsy4gyp09qURggBu9zhJnMVmOI3rIHZkmkhwzc6uUJlyhDCTiFWOz
3ZF3LjbZEyCKodMD8qMHbs3axIBpIfZqjzkvSKyFnvfXEGVytVse7NUuWtQ36u92
GnURxVeYY1TDVXvE1Y8owNKMxknKQ6YRlypP7Dtbeo/qG6hShp0xmS7qDLDi0ybZ
ZlK+bDECiVoDf3nvJo+8v5M82IJ3CBt4UYldeRJsa1YCK/FsbK8tp91fkEfnXVue
+U6LPX0AmMpXacR5HaZfb3uBIKRw/QMdP/7RFtBPhpV6jqCrEmuqHnpPQiEVtxwO
UmG7bt94Trk=
=3VDr
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow the
flexible utilization of SMT siblings, without exposing untrusted
domains to information leaks & side channels, plus to ensure more
deterministic computing performance on SMT systems used by
heterogenous workloads.
There are new prctls to set core scheduling groups, which allows
more flexible management of workloads that can share siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve 'memcache'-like
workloads.
- "Age" (decay) average idle time, to better track & improve
workloads such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked via
/sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable it at
runtime if tooling needs it. Use static keys and other
optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/doc: Update the CPU capacity asymmetry bits
sched/topology: Rework CPU capacity asymmetry detection
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
psi: Fix race between psi_trigger_create/destroy
sched/fair: Introduce the burstable CFS controller
sched/uclamp: Fix uclamp_tg_restrict()
sched/rt: Fix Deadline utilization tracking during policy change
sched/rt: Fix RT utilization tracking during policy change
sched: Change task_struct::state
sched,arch: Remove unused TASK_STATE offsets
sched,timer: Use __set_current_state()
sched: Add get_current_state()
sched,perf,kvm: Fix preemption condition
sched: Introduce task_is_running()
sched: Unbreak wakeups
sched/fair: Age the average idle time
sched/cpufreq: Consider reduced CPU capacity in energy calculation
sched/fair: Take thermal pressure into account while estimating energy
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
sched/fair: Return early from update_tg_cfs_load() if delta == 0
...
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmDZaxMRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hPgw//f9SnGzFoP1uR5TBqM8j/QHulMewew/iD
dM5lh2emdmqHWYPBeRxUHgag38K2Golr3Y+NxLA3R+RMx+OZQe8Mz/wYvPQcBvsV
k1HHImU3GRMn4GM7GwxH3vPIottDUx3mNS2J6pzlw3kwRUVqrxUdj/0/pSY/4eJ7
ZT4uq4yLV83Jd3qioU7o7e/u6MrdNIIcAXRpVDdE9Mm1+kWXSVN7/h3Vsiz4tj5E
iS+UXEtSc1a2mnmekv63pYkJHHNUb6guD8jgI/wrm1KIFGjDRifM+3TV6R/kB96/
TfD2LhCcTShfSp8KI191pgV7/NQbB/PmLdSYmff3rTBiii4cqXuCygJCHInZ09z0
4fTSSqM6aHg7kfTQyOCp+DUQ+9vNVXWo8mxt9c6B8xA0GyCI3zhjQ4UIiSUWRpjs
Be5ZyF0kNNuPxYrKFnGnBf8+51DURpCz3sDdYRuK4KNkj1+4ZvJo/KzGTMUUIE4B
IDQG6wDP5Kb388eRDtKrG5X7IXg+L5F/kezin60j0QF5MwDgxirT217teN8H1lNn
YgWMjRK8Tw0flUJsbCxa51/nl93UtByB+fIRIc88MSeLxcI6/ORW+TxBBEqkYm5Z
6BLFtmHSuAqAXUuyZXSGLcW7XLJvIaDoHgvbDn6l4g7FMWHqPOIq6nJQY3L8ben2
e+fQrGh4noI=
=20Vc
-----END PGP SIGNATURE-----
Merge tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Platform PMU driver updates:
- x86 Intel uncore driver updates for Skylake (SNR) and Icelake (ICX) servers
- Fix RDPMC support
- Fix [extended-]PEBS-via-PT support
- Fix Sapphire Rapids event constraints
- Fix :ppp support on Sapphire Rapids
- Fix fixed counter sanity check on Alder Lake & X86_FEATURE_HYBRID_CPU
- Other heterogenous-PMU fixes
- Kprobes:
- Remove the unused and misguided kprobe::fault_handler callbacks.
- Warn about kprobes taking a page fault.
- Fix the 'nmissed' stat counter.
- Misc cleanups and fixes.
* tag 'perf-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf: Fix task context PMU for Hetero
perf/x86/intel: Fix instructions:ppp support in Sapphire Rapids
perf/x86/intel: Add more events requires FRONTEND MSR on Sapphire Rapids
perf/x86/intel: Fix fixed counter check warning for some Alder Lake
perf/x86/intel: Fix PEBS-via-PT reload base value for Extended PEBS
perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task
kprobes: Do not increment probe miss count in the fault handler
x86,kprobes: WARN if kprobes tries to handle a fault
kprobes: Remove kprobe::fault_handler
uprobes: Update uprobe_write_opcode() kernel-doc comment
perf/hw_breakpoint: Fix DocBook warnings in perf hw_breakpoint
perf/core: Fix DocBook warnings
perf/core: Make local function perf_pmu_snapshot_aux() static
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX
perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on SNR
perf/x86/intel/uncore: Generalize I/O stacks to PMON mapping procedure
perf/x86/intel/uncore: Drop unnecessary NULL checks after container_of()
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), memmove(), and memset(), avoid
confusing the checks when using a static const source.
Move the static const array into a variable so the compiler can perform
appropriate bounds checking.
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210616201823.1245603-1-keescook@chromium.org
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The current code doesn't clear the thread/group maps for offline
CPUs. This may cause kernel crashes like the one bewlow in common
code that assumes if a CPU has sibblings it is online.
Unable to handle kernel pointer dereference in virtual kernel address space
Call Trace:
[<000000013a4b8c3c>] blk_mq_map_swqueue+0x10c/0x388
([<000000013a4b8bcc>] blk_mq_map_swqueue+0x9c/0x388)
[<000000013a4b9300>] blk_mq_init_allocated_queue+0x448/0x478
[<000000013a4b9416>] blk_mq_init_queue+0x4e/0x90
[<000003ff8019d3e6>] loop_add+0x106/0x278 [loop]
[<000003ff801b8148>] loop_init+0x148/0x1000 [loop]
[<0000000139de4924>] do_one_initcall+0x3c/0x1e0
[<0000000139ef449a>] do_init_module+0x6a/0x2a0
[<0000000139ef61bc>] __do_sys_finit_module+0xa4/0xc0
[<0000000139de9e6e>] do_syscall+0x7e/0xd0
[<000000013a8e0aec>] __do_syscall+0xbc/0x110
[<000000013a8ee2e8>] system_call+0x78/0xa0
Fixes: 52aeda7acc ("s390/topology: remove offline CPUs from CPU topology masks")
Cc: <stable@kernel.org> # 5.7+
Reported-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The current irq entry code doesn't initialize pt_regs::flags. On exit to
user mode arch_do_signal_or_restart() tests whether PIF_SYSCALL is set,
which might yield wrong results.
Fix this by clearing pt_regs::flags in the entry.S irq handler
code.
Reported-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Fixes: 56e62a7370 ("s390: convert to generic entry")
Cc: <stable@vger.kernel.org> # 5.12
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
glibc complained with "The futex facility returned an unexpected error
code.". It turned out that the futex syscall returned -ERESTARTSYS because
a signal is pending. arch_do_signal_or_restart() restored the syscall
parameters (nameley regs->gprs[2]) and set PIF_SYSCALL_RESTART. When
another signal is made pending later in the exit loop
arch_do_signal_or_restart() is called again. This function clears
PIF_SYSCALL_RESTART and checks the return code which is set in
regs->gprs[2]. However, regs->gprs[2] was restored in the previous run
and no longer contains -ERESTARTSYS, so PIF_SYSCALL_RESTART isn't set
again and the syscall is skipped.
Fix this by not clearing PIF_SYSCALL_RESTART - it is already cleared in
__do_syscall() when the syscall is restarted.
Reported-by: Bjoern Walk <bwalk@linux.ibm.com>
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Fixes: 56e62a7370 ("s390: convert to generic entry")
Cc: <stable@vger.kernel.org> # 5.12
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove register asm usage from diag8_noresponse() since it wasn't
needed at all. There is no requirement for even/odd register pairs for
diag 0x8.
For diag_response() use register pairs to fulfill the rx+1 and ry+1
requirements as required if a response buffer is specified. Also
change the inline asm to return the condition code of the diagnose
instruction and do the conditional handling of response length
calculation in C.
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When read via debugfs, s390dbf debug-views print the kernel address of
the call-site that created a trace entry. The kernel's %p pointer
hashing feature obfuscates this address, and commit 860ec7c6e2
("s390/debug: use pK for kernel pointers") made this obfuscation
configurable via the kptr_restrict sysctl.
Obfuscation of kernel address data printed via s390dbf debug-views does
not add any additional protection since the associated debugfs files are
only accessible to the root user that typically has enough other means
to obtain kernel address data.
Also trace payload data may contain binary representations of kernel
addresses as part of logged data structues. Requiring such payload data
to be obfuscated as well would be impractical and greatly diminish the
use of s390dbf.
Therefore completely remove pointer obfuscation from s390dbf
debug-views.
Reviewed-by: Steffen Maier <maier@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Since OLDMEM_BASE/OLDMEM_SIZE is already taken into consideration and is
reflected in ident_map_size. reserve/remove_oldmem() is no longer needed
and could be removed.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Currently there are two separate places where kernel memory layout has
to be known and adjusted:
1. early kasan setup.
2. paging setup later.
Those 2 places had to be kept in sync and adjusted to reflect peculiar
technical details of one another. With additional factors which influence
kernel memory layout like ultravisor secure storage limit, complexity
of keeping two things in sync grew up even more.
Besides that if we look forward towards creating identity mapping and
enabling DAT before jumping into uncompressed kernel - that would also
require full knowledge of and control over kernel memory layout.
So, de-duplicate and move kernel memory layout setup logic into
the decompressor.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org