The Arm errata covered by ARM64_WORKAROUND_CLEAN_CACHE require
that "dc cvau" instructions get promoted to "dc civac".
Reported-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210524083001.2586635-4-tabba@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Reliable stacktracing requires that we identify when a stacktrace is
terminated early. We can do this by ensuring all tasks have a final
frame record at a known location on their task stack, and checking
that this is the final frame record in the chain.
We'd like to use task_pt_regs(task)->stackframe as the final frame
record, as this is already setup upon exception entry from EL0. For
kernel tasks we need to consistently reserve the pt_regs and point x29
at this, which we can do with small changes to __primary_switched,
__secondary_switched, and copy_process().
Since the final frame record must be at a specific location, we must
create the final frame record in __primary_switched and
__secondary_switched rather than leaving this to start_kernel and
secondary_start_kernel. Thus, __primary_switched and
__secondary_switched will now show up in stacktraces for the idle tasks.
Since the final frame record is now identified by its location rather
than by its contents, we identify it at the start of unwind_frame(),
before we read any values from it.
External debuggers may terminate the stack trace when FP == 0. In the
pt_regs->stackframe, the PC is 0 as well. So, stack traces taken in the
debugger may print an extra record 0x0 at the end. While this is not
pretty, this does not do any harm. This is a small price to pay for
having reliable stack trace termination in the kernel. That said, gdb
does not show the extra record probably because it uses DWARF and not
frame pointers for stack traces.
Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
[Mark: rebase, use ASM_BUG(), update comments, update commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210510110026.18061-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
As pointed out by commit
de9b8f5dcb ("sched: Fix crash trying to dequeue/enqueue the idle thread")
init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.
As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().
Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().
Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().
Secondary startups were patched via coccinelle:
@begone@
@@
-preempt_disable();
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
- Restore terminal stack frame records. Their previous removal caused
traces which cross secondary_start_kernel to terminate one entry too
late, with a spurious "0" entry.
- Fix boot warning with pseudo-NMI due to the way we manipulate the PMR
register.
- ACPI fixes: avoid corruption of interrupt mappings on watchdog probe
failure (GTDT), prevent unregistering of GIC SGIs.
- Force SPARSEMEM_VMEMMAP as the only memory model, it saves with having
to test all the other combinations.
- Documentation fixes and updates: tagged address ABI exceptions on
brk/mmap/mremap(), event stream frequency, update booting requirements
on the configuration of traps.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmCVba0ACgkQa9axLQDI
XvEClxAAsqigp+Mnotdr8YUOuXLjHWU41EMShV6WbFcmlViEyZxxtZ5qavw19T3L
rPxb8hq9QqI8kCd+j4MAU7cdc0ry+047njJmQ3Va0WeiDsbgEfPvLWPguDbeDFXW
EjKKib+F/u58IffDkn6rVA7ZVPgYHRH+8yw6EdApp0BN4JuxEFzGBzG4EWKXnNHH
IOu4IIXlbLX+U1kTtUFR4u6i4uBs2pZdEYzo1NF/Joacg14F01CBRuh8U04eeWFD
HF4pWd4eCl/bLYPurF1rOi1dIUyrPuaPgNInGEdSaocD0hIvQH0r0wyIt+aMmqvK
9Jm+dDEGeLxQn2nDrXfyldYG5EbFa3OplkUt2MVDDMWwN2Gpsjlnf/ucff/SBT/N
7D6AL2OH6KDDCsNgU1JH9H6rAlh4nWJcsMBrWmP7aQtBMRyccQLywrt4HXB8cy7E
+MyhTit05P3lpsrK2uZSFujK35Ts8hxywA7lAlU7YP4ADKu3Noc6qXSaxZRe+1Gb
O5k3Qdcih0VLE843PjJj8f8fW1ysJW5J60cK9BaZxpB77gNufKkh/hS6YAiA8qkt
PT3J0jk/cgGvwKK54rW52dG7qvDImgUMGkXGKQnEimgb62DatCZ4ZOPC+UoiDiqO
SEd1DSW0Lt1VxVIulAjatVgzIJGM0jGCm9L7/vBguR0+Lahakbg=
=vYok
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"A mix of fixes and clean-ups that turned up too late for the first
pull request:
- Restore terminal stack frame records. Their previous removal caused
traces which cross secondary_start_kernel to terminate one entry
too late, with a spurious "0" entry.
- Fix boot warning with pseudo-NMI due to the way we manipulate the
PMR register.
- ACPI fixes: avoid corruption of interrupt mappings on watchdog
probe failure (GTDT), prevent unregistering of GIC SGIs.
- Force SPARSEMEM_VMEMMAP as the only memory model, it saves with
having to test all the other combinations.
- Documentation fixes and updates: tagged address ABI exceptions on
brk/mmap/mremap(), event stream frequency, update booting
requirements on the configuration of traps"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kernel: Update the stale comment
arm64: Fix the documented event stream frequency
arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
arm64: Explicitly document boot requirements for SVE
arm64: Explicitly require that FPSIMD instructions do not trap
arm64: Relax booting requirements for configuration of traps
arm64: cpufeatures: use min and max
arm64: stacktrace: restore terminal records
arm64/vdso: Discard .note.gnu.property sections in vDSO
arm64: doc: Add brk/mmap/mremap() to the Tagged Address ABI Exceptions
psci: Remove unneeded semicolon
ACPI: irq: Prevent unregistering of GIC SGIs
ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
arm64: Show three registers per line
arm64: remove HAVE_DEBUG_BUGVERBOSE
arm64: alternative: simplify passing alt_region
arm64: Force SPARSEMEM_VMEMMAP as the only memory management model
arm64: vdso32: drop -no-integrated-as flag
Commit af391b15f7 ("arm64: kernel: rename __cpu_suspend to keep it aligned with arm")
has used @index instead of @arg, but the comment is stale, update it.
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/1620280462-21937-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Zenghui reports that booting a kernel with "irqchip.gicv3_pseudo_nmi=1"
on the command line hits a warning during kernel entry, due to the way
we manipulate the PMR.
Early in the entry sequence, we call lockdep_hardirqs_off() to inform
lockdep that interrupts have been masked (as the HW sets DAIF wqhen
entering an exception). Architecturally PMR_EL1 is not affected by
exception entry, and we don't set GIC_PRIO_PSR_I_SET in the PMR early in
the exception entry sequence, so early in exception entry the PMR can
indicate that interrupts are unmasked even though they are masked by
DAIF.
If DEBUG_LOCKDEP is selected, lockdep_hardirqs_off() will check that
interrupts are masked, before we set GIC_PRIO_PSR_I_SET in any of the
exception entry paths, and hence lockdep_hardirqs_off() will WARN() that
something is amiss.
We can avoid this by consistently setting GIC_PRIO_PSR_I_SET during
exception entry so that kernel code sees a consistent environment. We
must also update local_daif_inherit() to undo this, as currently only
touches DAIF. For other paths, local_daif_restore() will update both
DAIF and the PMR. With this done, we can remove the existing special
cases which set this later in the entry code.
We always use (GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET) for consistency with
local_daif_save(), as this will warn if it ever encounters
(GIC_PRIO_IRQOFF | GIC_PRIO_PSR_I_SET), and never sets this itself. This
matches the gic_prio_kentry_setup that we have to retain for
ret_to_user.
The original splat from Zenghui's report was:
| DEBUG_LOCKS_WARN_ON(!irqs_disabled())
| WARNING: CPU: 3 PID: 125 at kernel/locking/lockdep.c:4258 lockdep_hardirqs_off+0xd4/0xe8
| Modules linked in:
| CPU: 3 PID: 125 Comm: modprobe Tainted: G W 5.12.0-rc8+ #463
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 604003c5 (nZCv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : lockdep_hardirqs_off+0xd4/0xe8
| lr : lockdep_hardirqs_off+0xd4/0xe8
| sp : ffff80002a39bad0
| pmr_save: 000000e0
| x29: ffff80002a39bad0 x28: ffff0000de214bc0
| x27: ffff0000de1c0400 x26: 000000000049b328
| x25: 0000000000406f30 x24: ffff0000de1c00a0
| x23: 0000000020400005 x22: ffff8000105f747c
| x21: 0000000096000044 x20: 0000000000498ef9
| x19: ffff80002a39bc88 x18: ffffffffffffffff
| x17: 0000000000000000 x16: ffff800011c61eb0
| x15: ffff800011700a88 x14: 0720072007200720
| x13: 0720072007200720 x12: 0720072007200720
| x11: 0720072007200720 x10: 0720072007200720
| x9 : ffff80002a39bad0 x8 : ffff80002a39bad0
| x7 : ffff8000119f0800 x6 : c0000000ffff7fff
| x5 : ffff8000119f07a8 x4 : 0000000000000001
| x3 : 9bcdab23f2432800 x2 : ffff800011730538
| x1 : 9bcdab23f2432800 x0 : 0000000000000000
| Call trace:
| lockdep_hardirqs_off+0xd4/0xe8
| enter_from_kernel_mode.isra.5+0x7c/0xa8
| el1_abort+0x24/0x100
| el1_sync_handler+0x80/0xd0
| el1_sync+0x6c/0x100
| __arch_clear_user+0xc/0x90
| load_elf_binary+0x9fc/0x1450
| bprm_execve+0x404/0x880
| kernel_execve+0x180/0x188
| call_usermodehelper_exec_async+0xdc/0x158
| ret_from_fork+0x10/0x18
Fixes: 23529049c6 ("arm64: entry: fix non-NMI user<->kernel transitions")
Fixes: 7cd1ea1010 ("arm64: entry: fix non-NMI kernel<->kernel transitions")
Fixes: f0cd5ac1e4 ("arm64: entry: fix NMI {user, kernel}->kernel transitions")
Fixes: 2a9b3e6ac6 ("arm64: entry: fix EL1 debug transitions")
Link: https://lore.kernel.org/r/f4012761-026f-4e51-3a0c-7524e434e8b3@huawei.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Zenghui Yu <yuzenghui@huawei.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210428111555.50880-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation,
zap under read lock, enable/disably dirty page logging under
read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing
the architecture-specific code
- Some selftests improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK
y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b
c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+
Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY
+2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b
M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ==
=AXUi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"This is a large update by KVM standards, including AMD PSP (Platform
Security Processor, aka "AMD Secure Technology") and ARM CoreSight
(debug and trace) changes.
ARM:
- CoreSight: Add support for ETE and TRBE
- Stage-2 isolation for the host kernel when running in protected
mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
x86:
- AMD PSP driver changes
- Optimizations and cleanup of nested SVM code
- AMD: Support for virtual SPEC_CTRL
- Optimizations of the new MMU code: fast invalidation, zap under
read lock, enable/disably dirty page logging under read lock
- /dev/kvm API for AMD SEV live migration (guest API coming soon)
- support SEV virtual machines sharing the same encryption context
- support SGX in virtual machines
- add a few more statistics
- improved directed yield heuristics
- Lots and lots of cleanups
Generic:
- Rework of MMU notifier interface, simplifying and optimizing the
architecture-specific code
- a handful of "Get rid of oprofile leftovers" patches
- Some selftests improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits)
KVM: selftests: Speed up set_memory_region_test
selftests: kvm: Fix the check of return value
KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt()
KVM: SVM: Skip SEV cache flush if no ASIDs have been used
KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids()
KVM: SVM: Drop redundant svm_sev_enabled() helper
KVM: SVM: Move SEV VMCB tracking allocation to sev.c
KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup()
KVM: SVM: Unconditionally invoke sev_hardware_teardown()
KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported)
KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y
KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables
KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features
KVM: SVM: Move SEV module params/variables to sev.c
KVM: SVM: Disable SEV/SEV-ES if NPT is disabled
KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails
KVM: SVM: Zero out the VMCB array used to track SEV ASID association
x86/sev: Drop redundant and potentially misleading 'sev_enabled'
KVM: x86: Move reverse CPUID helpers to separate header file
KVM: x86: Rename GPR accessors to make mode-aware variants the defaults
...
Use min and max to make the effect more clear.
Generated by: scripts/coccinelle/misc/minmax.cocci
CC: Denis Efremov <efremov@linux.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2104292246300.16899@hadrien
[catalin.marinas@arm.com: include <linux/minmax.h> explicitly]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We removed the terminal frame records in commit:
6106e1112c ("arm64: remove EL0 exception frame record")
... on the assumption that as we no longer used them to find the pt_regs
at exception boundaries, they were no longer necessary.
However, Leo reports that as an unintended side-effect, this causes
traces which cross secondary_start_kernel to terminate one entry too
late, with a spurious "0" entry.
There are a few ways we could sovle this, but as we're planning to use
terminal records for RELIABLE_STACKTRACE, let's revert the logic change
for now, keeping the update comments and accounting for the changes in
commit:
3c02600144 ("arm64: stacktrace: Report when we reach the end of the stack")
This is effectively a partial revert of commit:
6106e1112c ("arm64: remove EL0 exception frame record")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 6106e1112c ("arm64: remove EL0 exception frame record")
Reported-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: "Madhavan T. Venkataraman" <madvenka@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210429104813.GA33550@C02TD0UTHF1T.local
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The arm64 assembler in binutils 2.32 and above generates a program
property note in a note section, .note.gnu.property, to encode used x86
ISAs and features. But the kernel linker script only contains a single
NOTE segment:
PHDRS
{
text PT_LOAD FLAGS(5) FILEHDR PHDRS; /* PF_R|PF_X */
dynamic PT_DYNAMIC FLAGS(4); /* PF_R */
note PT_NOTE FLAGS(4); /* PF_R */
}
The NOTE segment generated by the vDSO linker script is aligned to 4 bytes.
But the .note.gnu.property section must be aligned to 8 bytes on arm64.
$ readelf -n vdso64.so
Displaying notes found in: .note
Owner Data size Description
Linux 0x00000004 Unknown note type: (0x00000000)
description data: 06 00 00 00
readelf: Warning: note with invalid namesz and/or descsz found at offset 0x20
readelf: Warning: type: 0x78, namesize: 0x00000100, descsize: 0x756e694c, alignment: 8
Since the note.gnu.property section in the vDSO is not checked by the
dynamic linker, discard the .note.gnu.property sections in the vDSO.
Similar to commit 4caffe6a28 ("x86/vdso: Discard .note.gnu.property
sections in vDSO"), but for arm64.
Signed-off-by: Bill Wendling <morbo@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210423205159.830854-1-morbo@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Refactoring powerpc and arm64 kexec DT handling to common code. This
enables IMA on arm64.
- Add kbuild support for applying DT overlays at build time. The first
user are the DT unittests.
- Fix kerneldoc formatting and W=1 warnings in drivers/of/
- Fix handling 64-bit flag on PCI resources
- Bump dtschema version required to v2021.2.1
- Enable undocumented compatible checks for dtbs_check. This allows
tracking of missing binding schemas.
- DT docs improvements. Regroup the DT docs and add the example schema
and DT kernel ABI docs to the doc build.
- Convert Broadcom Bluetooth and video-mux bindings to schema
- Add QCom sm8250 Venus video codec binding schema
- Add vendor prefixes for AESOP, YIC System Co., Ltd, and Siliconfile
Technologies Inc.
- Cleanup of DT schema type references on common properties and
standard unit properties
-----BEGIN PGP SIGNATURE-----
iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmCIYdgQHHJvYmhAa2Vy
bmVsLm9yZwAKCRD6+121jbxhw/PKEACkOCWDnLSY9U7w1uGDHr6UgXIWOY9j8bYy
2pTvDrVa6KZphT6yGU/hxrOk8Mqh5AMd2vUhO2OCoyyl/priTv+Ktqo+bikvJZLa
MQm3JnrLpPy/GetdmVD8wq1l+FoeOSTnRIJqRxInsd8UFVpZImtP22ELox6KgGiv
keVHIrjsHU/HpafK3w8wHCLikCZk+1Gl6pL/QgFDv2FaaCTKW16Dt64dPqYm49Xk
j7YMMQWl+3NJ9ywZV0+PMbl9udi3EjGm5Ap5VfKzpj53Nh07QObg/QtH/1sj0HPo
apyW7jAyQFyLytbjxzFL/tljtOeW/5rZos1GWThZ326e+Y0mTKUTDZShvNplfjIf
e26FvVi7gndWlRSr30Ia5gdNFAx72IkpJUAuypBXgb+qNPchBJjAXLn9tcIcg/k+
2R6BIB7SkVLpgTnJ1Bq1+PRqkKM+ggACdJNJIUApj44xoiG01vtGDGRaFuIio+Ch
HT4aBbic4kLvagm8VzuiIF/sL7af5pntzArcyOfQTaZ92DyGI2C0j90rK3yPRIYM
u9qX/24t1SXiUji74QpoQFzt/+Egy5hYXMJOJJSywUjKf7DBhehqklTjiJRQHKm6
0DJ/n8q4lNru8F0Y4keKSuYTfHBstF7fS3UTH/rUmBAbfEwkvZe6B29KQbs+7aph
GTw+jeoR5Q==
=rF27
-----END PGP SIGNATURE-----
Merge tag 'devicetree-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull devicetree updates from Rob Herring:
- Refactor powerpc and arm64 kexec DT handling to common code. This
enables IMA on arm64.
- Add kbuild support for applying DT overlays at build time. The first
user are the DT unittests.
- Fix kerneldoc formatting and W=1 warnings in drivers/of/
- Fix handling 64-bit flag on PCI resources
- Bump dtschema version required to v2021.2.1
- Enable undocumented compatible checks for dtbs_check. This allows
tracking of missing binding schemas.
- DT docs improvements. Regroup the DT docs and add the example schema
and DT kernel ABI docs to the doc build.
- Convert Broadcom Bluetooth and video-mux bindings to schema
- Add QCom sm8250 Venus video codec binding schema
- Add vendor prefixes for AESOP, YIC System Co., Ltd, and Siliconfile
Technologies Inc.
- Cleanup of DT schema type references on common properties and
standard unit properties
* tag 'devicetree-for-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (64 commits)
powerpc: If kexec_build_elf_info() fails return immediately from elf64_load()
powerpc: Free fdt on error in elf64_load()
of: overlay: Fix kerneldoc warning in of_overlay_remove()
of: linux/of.h: fix kernel-doc warnings
of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses
dt-bindings: bcm4329-fmac: add optional brcm,ccode-map
docs: dt: update writing-schema.rst references
dt-bindings: media: venus: Add sm8250 dt schema
of: base: Fix spelling issue with function param 'prop'
docs: dt: Add DT API documentation
of: Add missing 'Return' section in kerneldoc comments
of: Fix kerneldoc output formatting
docs: dt: Group DT docs into relevant sub-sections
docs: dt: Make 'Devicetree' wording more consistent
docs: dt: writing-schema: Include the example schema in the doc build
docs: dt: writing-schema: Remove spurious indentation
dt-bindings: Fix reference in submitting-patches.rst to the DT ABI doc
dt-bindings: ddr: Add optional manufacturer and revision ID to LPDDR3
dt-bindings: media: video-interfaces: Drop the example
devicetree: bindings: clock: Minor typo fix in the file armada3700-tbg-clock.txt
...
- Clean up list_sort prototypes (Sami Tolvanen)
- Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmCHCR8ACgkQiXL039xt
wCZyFQ//fnUZaXR2K354zDyW6CJljMf+d94RF6rH+J6eMTH2/HXa5v0iJokwABLf
ussP6qF4k5wtmI22Gm9A5Zc3e4iiry5pC0jOdk0mk4gzWwFN9MdgNxJZIGA3xqhS
bsBK4AGrVKjtZl48G1/ZxJuNDeJhVp6GNK2n6/Gl4rZF6R7D/Upz0XelyJRdDpcM
HIGma7jZl6xfGU0mdWCzpOGK1zdMca1WVs7A4YuurSbLn5PZJrcNVWLouDqt/Si2
AduSri1gyPClicgvqWjMOzhUpuw/nJtBLRl1x1EsWk/KSZ1/uNVjlewfzdN4fZrr
zbtFr2gLubYLK6JOX7/LqoHlOTgE3tYLL+WIVN75DsOGZBKgHhmebTmWLyqzV0SL
oqcyM5d3ucC6msdtAK5Fv4MSp8rpjqlK1Ha4SGRT6kC2wut7AhZ3KD7eyRIz8mV9
Sa9mhignGFJnTEUp+LSbYdrAudgSKxB40WyXPmswAXX4VJFRD4ONrrcAON/SzkUT
Hw/JdFRCKkJjgwNQjIQoZcUNMTbFz2PlNIEnjJWm38YImQKQlCb2mXaZKCwBkf45
aheCZk17eKoxTCXFMd+KxlyNEtS2yBfq/PpZgvw7GW/pfFbWUg1+2O41LnihIe5v
zu0hN1wNCQqgfxiMZqX1OTb9C/2vybzGsXILt+9nppjZ8EBU7iU=
=wU6U
-----END PGP SIGNATURE-----
Merge tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull CFI on arm64 support from Kees Cook:
"This builds on last cycle's LTO work, and allows the arm64 kernels to
be built with Clang's Control Flow Integrity feature. This feature has
happily lived in Android kernels for almost 3 years[1], so I'm excited
to have it ready for upstream.
The wide diffstat is mainly due to the treewide fixing of mismatched
list_sort prototypes. Other things in core kernel are to address
various CFI corner cases. The largest code portion is the CFI runtime
implementation itself (which will be shared by all architectures
implementing support for CFI). The arm64 pieces are Acked by arm64
maintainers rather than coming through the arm64 tree since carrying
this tree over there was going to be awkward.
CFI support for x86 is still under development, but is pretty close.
There are a handful of corner cases on x86 that need some improvements
to Clang and objtool, but otherwise works well.
Summary:
- Clean up list_sort prototypes (Sami Tolvanen)
- Introduce CONFIG_CFI_CLANG for arm64 (Sami Tolvanen)"
* tag 'cfi-v5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
arm64: allow CONFIG_CFI_CLANG to be selected
KVM: arm64: Disable CFI for nVHE
arm64: ftrace: use function_nocfi for ftrace_call
arm64: add __nocfi to __apply_alternatives
arm64: add __nocfi to functions that jump to a physical address
arm64: use function_nocfi with __pa_symbol
arm64: implement function_nocfi
psci: use function_nocfi for cpu_resume
lkdtm: use function_nocfi
treewide: Change list_sort to use const pointers
bpf: disable CFI in dispatcher functions
kallsyms: strip ThinLTO hashes from static functions
kthread: use WARN_ON_FUNCTION_MISMATCH
workqueue: use WARN_ON_FUNCTION_MISMATCH
module: ensure __cfi_check alignment
mm: add generic function_nocfi macro
cfi: add __cficanonical
add support for Clang CFI
- Add idle states table for IceLake-D to the intel_idle driver and
update IceLake-X C6 data in it (Artem Bityutskiy).
- Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
drop the unused do_idle() firmware call from it (Dmitry Osipenko).
- Fix cpuidle-qcom-spm Kconfig entry (He Ying).
- Fix handling of possible negative tick_nohz_get_next_hrtimer()
return values of in cpuidle governors (Rafael Wysocki).
- Add support for frequency-invariance to the ACPI CPPC cpufreq
driver and update the frequency-invariance engine (FIE) to use it
as needed (Viresh Kumar).
- Simplify the default delay_us setting in the ACPI CPPC cpufreq
driver (Tom Saeger).
- Clean up frequency-related computations in the intel_pstate
cpufreq driver (Rafael Wysocki).
- Fix TBG parent setting for load levels in the armada-37xx
cpufreq driver and drop the CPU PM clock .set_parent method for
armada-37xx (Marek Behún).
- Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).
- Fix handling of dev_pm_opp_of_cpumask_add_table() return values
in cpufreq-dt to take the -EPROBE_DEFER one into acconut as
appropriate (Quanyang Wang).
- Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).
- Drop the unused for_each_policy() macro from cpufreq (Shaokun
Zhang).
- Simplify computations in the schedutil cpufreq governor to avoid
unnecessary overhead (Yue Hu).
- Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Fix cpufreq documentation links in Kconfig (Alexander Monakov).
- Fix PCI device power state handling in pci_enable_device_flags()
to avoid issuse in some cases when the device depends on an ACPI
power resource (Rafael Wysocki).
- Add missing documentation of pm_runtime_resume_and_get() (Alan
Stern).
- Add missing static inline stub for pm_runtime_has_no_callbacks()
to pm_runtime.h and drop the unused try_to_freeze_nowarn()
definition (YueHaibing).
- Drop duplicate struct device declaration from pm.h and fix a
structure type declaration in intel_rapl.h (Wan Jiabing).
- Use dev_set_name() instead of an open-coded equivalent of it in
the wakeup sources code and drop a redundant local variable
initialization from it (Andy Shevchenko, Colin Ian King).
- Use crc32 instead of md5 for e820 memory map integrity check
during resume from hibernation on x86 (Chris von Recklinghausen).
- Fix typos in comments in the system-wide and hibernation support
code (Lu Jialin).
- Modify the generic power domains (genpd) code to avoid resuming
devices in the "prepare" phase of system-wide suspend and
hibernation (Ulf Hansson).
- Add Hygon Fam18h RAPL support to the intel_rapl power capping
driver (Pu Wen).
- Add MAINTAINERS entry for the dynamic thermal power management
(DTPM) code (Daniel Lezcano).
- Add devm variants of operating performance points (OPP) API
functions and switch over some users of the OPP framework to
the new resource-managed API (Yangtao Li and Dmitry Osipenko).
- Update devfreq core:
* Register devfreq devices as cooling devices on demand (Daniel
Lezcano).
* Add missing unlock opeation in devfreq_add_device() (Lukasz
Luba).
* Use the next frequency as resume_freq instead of the previous
frequency when using the opp-suspend property (Dong Aisheng).
* Check get_dev_status in devfreq_update_stats() (Dong Aisheng).
* Fix set_freq path for the userspace governor in Kconfig (Dong
Aisheng).
* Remove invalid description of get_target_freq() (Dong Aisheng).
- Update devfreq drivers:
* imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
of_match_ptr() (Dong Aisheng, Fabio Estevam).
* rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
references to undefined symbols (Enric Balletbo i Serra, Gaël
PORTAY).
* rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
Kozlowski).
* imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).
- Fix kernel-doc warnings in three places (Pierre-Louis Bossart).
- Fix typo in the pm-graph utility code (Ricardo Ribalda).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmCHAUISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxAxMP/0tFjgxeaJ3chYaiqoPlk2QX/XdwqJvm
8OOu2qBMWbt2bubcIlAPpdlCNaERI4itF7E8za7t9alswdq7YPWGmNR9snCXUKhD
BzERuicZTeOcCk2P3DTgzLVc4EzF6wutA3lTdYYZIpf+LuuB+guG8zgMzScRHIsM
N3I83O+iLTA9ifIqN0/wH//a0ISvo6rSWtcbx+48d5bYvYixW7CsBmoxWHhGiYsw
4PJ4AzbdNJEhQp91SBYPIAmqwV88FZUPoYnRazXMxOSevMewhP9JuCHXAujC3gLV
l5d2TBaBV4EBYLD5tfCpJvHMXhv/yBpg6KRivjri+zEnY1TAqIlfR4vYiL7puVvQ
PdwjyvNFDNGyUSX/AAwYF6F4WCtIhw8hCahw6Dw2zcDz0plFdRZmWAiTdmIjECJK
8EvwJNlmdl27G1y+EBc6NnwzEFZQwiu9F5bUHUkmc3fF1M1aFTza8WDNDo30TC94
94c+uVBRL2fBePl4FfGZATfJbOMb8+vDIkroQxrIjQDT/7Ha3Mz75JZDRHItZo92
+4fES2eFdfZISCLIQMBY5TeXox3O8qsirC1k4qELwy71gPUE9CpP3FkxKIvyZLlv
+6fS9ttpUfyFBF7gqrEy+ziVk1Rm4oorLmWCtthz4xyerzj5+vtZQqKzcySk0GA5
hYkseZkedR6y
=t+SG
-----END PGP SIGNATURE-----
Merge tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add some new hardware support (for example, IceLake-D idle
states in intel_idle), fix some issues (for example, the handling of
negative "sleep length" values in cpuidle governors), add new
functionality to the existing drivers (for example, scale-invariance
support in the ACPI CPPC cpufreq driver) and clean up code all over.
Specifics:
- Add idle states table for IceLake-D to the intel_idle driver and
update IceLake-X C6 data in it (Artem Bityutskiy).
- Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
drop the unused do_idle() firmware call from it (Dmitry Osipenko).
- Fix cpuidle-qcom-spm Kconfig entry (He Ying).
- Fix handling of possible negative tick_nohz_get_next_hrtimer()
return values of in cpuidle governors (Rafael Wysocki).
- Add support for frequency-invariance to the ACPI CPPC cpufreq
driver and update the frequency-invariance engine (FIE) to use it
as needed (Viresh Kumar).
- Simplify the default delay_us setting in the ACPI CPPC cpufreq
driver (Tom Saeger).
- Clean up frequency-related computations in the intel_pstate cpufreq
driver (Rafael Wysocki).
- Fix TBG parent setting for load levels in the armada-37xx cpufreq
driver and drop the CPU PM clock .set_parent method for armada-37xx
(Marek Behún).
- Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).
- Fix handling of dev_pm_opp_of_cpumask_add_table() return values in
cpufreq-dt to take the -EPROBE_DEFER one into acconut as
appropriate (Quanyang Wang).
- Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).
- Drop the unused for_each_policy() macro from cpufreq (Shaokun
Zhang).
- Simplify computations in the schedutil cpufreq governor to avoid
unnecessary overhead (Yue Hu).
- Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Fix cpufreq documentation links in Kconfig (Alexander Monakov).
- Fix PCI device power state handling in pci_enable_device_flags() to
avoid issuse in some cases when the device depends on an ACPI power
resource (Rafael Wysocki).
- Add missing documentation of pm_runtime_resume_and_get() (Alan
Stern).
- Add missing static inline stub for pm_runtime_has_no_callbacks() to
pm_runtime.h and drop the unused try_to_freeze_nowarn() definition
(YueHaibing).
- Drop duplicate struct device declaration from pm.h and fix a
structure type declaration in intel_rapl.h (Wan Jiabing).
- Use dev_set_name() instead of an open-coded equivalent of it in the
wakeup sources code and drop a redundant local variable
initialization from it (Andy Shevchenko, Colin Ian King).
- Use crc32 instead of md5 for e820 memory map integrity check during
resume from hibernation on x86 (Chris von Recklinghausen).
- Fix typos in comments in the system-wide and hibernation support
code (Lu Jialin).
- Modify the generic power domains (genpd) code to avoid resuming
devices in the "prepare" phase of system-wide suspend and
hibernation (Ulf Hansson).
- Add Hygon Fam18h RAPL support to the intel_rapl power capping
driver (Pu Wen).
- Add MAINTAINERS entry for the dynamic thermal power management
(DTPM) code (Daniel Lezcano).
- Add devm variants of operating performance points (OPP) API
functions and switch over some users of the OPP framework to the
new resource-managed API (Yangtao Li and Dmitry Osipenko).
- Update devfreq core:
* Register devfreq devices as cooling devices on demand (Daniel
Lezcano).
* Add missing unlock opeation in devfreq_add_device() (Lukasz
Luba).
* Use the next frequency as resume_freq instead of the previous
frequency when using the opp-suspend property (Dong Aisheng).
* Check get_dev_status in devfreq_update_stats() (Dong Aisheng).
* Fix set_freq path for the userspace governor in Kconfig (Dong
Aisheng).
* Remove invalid description of get_target_freq() (Dong Aisheng).
- Update devfreq drivers:
* imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
of_match_ptr() (Dong Aisheng, Fabio Estevam).
* rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
references to undefined symbols (Enric Balletbo i Serra, Gaël
PORTAY).
* rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
Kozlowski).
* imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).
- Fix kernel-doc warnings in three places (Pierre-Louis Bossart).
- Fix typo in the pm-graph utility code (Ricardo Ribalda)"
* tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
PM: wakeup: remove redundant assignment to variable retval
PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check
cpufreq: Kconfig: fix documentation links
PM: wakeup: use dev_set_name() directly
PM: runtime: Add documentation for pm_runtime_resume_and_get()
cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits()
cpufreq: armada-37xx: Fix module unloading
cpufreq: armada-37xx: Remove cur_frequency variable
cpufreq: armada-37xx: Fix determining base CPU frequency
cpufreq: armada-37xx: Fix driver cleanup when registration failed
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
cpufreq: armada-37xx: Fix setting TBG parent for load levels
cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration
cpuidle: tegra: Remove do_idle firmware call
cpuidle: tegra: Fix C7 idling state on Tegra114
PM: sleep: fix typos in comments
cpufreq: Remove unused for_each_policy macro
...
- MTE asynchronous support for KASan. Previously only synchronous
(slower) mode was supported. Asynchronous is faster but does not allow
precise identification of the illegal access.
- Run kernel mode SIMD with softirqs disabled. This allows using NEON in
softirq context for crypto performance improvements. The conditional
yield support is modified to take softirqs into account and reduce the
latency.
- Preparatory patches for Apple M1: handle CPUs that only have the VHE
mode available (host kernel running at EL2), add FIQ support.
- arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers, new
functions for the HiSilicon HHA and L3C PMU, cleanups.
- Re-introduce support for execute-only user permissions but only when
the EPAN (Enhanced Privileged Access Never) architecture feature is
available.
- Disable fine-grained traps at boot and improve the documented boot
requirements.
- Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC).
- Add hierarchical eXecute Never permissions for all page tables.
- Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs
to control which PAC keys are enabled in a particular task.
- arm64 kselftests for BTI and some improvements to the MTE tests.
- Minor improvements to the compat vdso and sigpage.
- Miscellaneous cleanups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmB5xkkACgkQa9axLQDI
XvEBgRAAsr6r8gsBQJP3FDHmbtbVf2ej5QJTCOAQAGHbTt0JH7Pk03pWSBr7h5nF
vsddRDxxeDgB6xd7jWP7EvDaPxHeB0CdSj5gG8EP/ZdOm8sFAwB1ZIHWikgUgSwW
nu6R28yXTMSj+EkyFtahMhTMJ1EMF4sCPuIgAo59ST5w/UMMqLCJByOu4ej6RPKZ
aeSJJWaDLBmbgnTKWxRvCc/MgIx4J/LAHWGkdpGjuMK6SLp38Kdf86XcrklXtzwf
K30ZYeoKq8zZ+nFOsK9gBVlOlocZcbS1jEbN842jD6imb6vKLQtBWrKk9A6o4v5E
XulORWcSBhkZb3ItIU9+6SmelUExf0VeVlSp657QXYPgquoIIGvFl6rCwhrdGMGO
bi6NZKCfJvcFZJoIN1oyhuHejgZSBnzGEcvhvzNdg7ItvOCed7q3uXcGHz/OI6tL
2TZKddzHSEMVfTo0D+RUsYfasZHI1qAiQ0mWVC31c+YHuRuW/K/jlc3a5TXlSBUa
Dwu0/zzMLiqx65ISx9i7XNMrngk55uzrS6MnwSByPoz4M4xsElZxt3cbUxQ8YAQz
jhxTHs1Pwes8i7f4n61ay/nHCFbmVvN/LlsPRpZdwd8JumThLrDolF3tc6aaY0xO
hOssKtnGY4Xvh/WitfJ5uvDb1vMObJKTXQEoZEJh4hlNQDxdeUE=
=6NGI
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- MTE asynchronous support for KASan. Previously only synchronous
(slower) mode was supported. Asynchronous is faster but does not
allow precise identification of the illegal access.
- Run kernel mode SIMD with softirqs disabled. This allows using NEON
in softirq context for crypto performance improvements. The
conditional yield support is modified to take softirqs into account
and reduce the latency.
- Preparatory patches for Apple M1: handle CPUs that only have the VHE
mode available (host kernel running at EL2), add FIQ support.
- arm64 perf updates: support for HiSilicon PA and SLLC PMU drivers,
new functions for the HiSilicon HHA and L3C PMU, cleanups.
- Re-introduce support for execute-only user permissions but only when
the EPAN (Enhanced Privileged Access Never) architecture feature is
available.
- Disable fine-grained traps at boot and improve the documented boot
requirements.
- Support CONFIG_KASAN_VMALLOC on arm64 (only with KASAN_GENERIC).
- Add hierarchical eXecute Never permissions for all page tables.
- Add arm64 prctl(PR_PAC_{SET,GET}_ENABLED_KEYS) allowing user programs
to control which PAC keys are enabled in a particular task.
- arm64 kselftests for BTI and some improvements to the MTE tests.
- Minor improvements to the compat vdso and sigpage.
- Miscellaneous cleanups.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (86 commits)
arm64/sve: Add compile time checks for SVE hooks in generic functions
arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG.
arm64: pac: Optimize kernel entry/exit key installation code paths
arm64: Introduce prctl(PR_PAC_{SET,GET}_ENABLED_KEYS)
arm64: mte: make the per-task SCTLR_EL1 field usable elsewhere
arm64/sve: Remove redundant system_supports_sve() tests
arm64: fpsimd: run kernel mode NEON with softirqs disabled
arm64: assembler: introduce wxN aliases for wN registers
arm64: assembler: remove conditional NEON yield macros
kasan, arm64: tests supports for HW_TAGS async mode
arm64: mte: Report async tag faults before suspend
arm64: mte: Enable async tag check fault
arm64: mte: Conditionally compile mte_enable_kernel_*()
arm64: mte: Enable TCO in functions that can read beyond buffer limits
kasan: Add report for async mode
arm64: mte: Drop arch_enable_tagging()
kasan: Add KASAN mode kernel parameter
arm64: mte: Add asynchronous mode support
arm64: Get rid of CONFIG_ARM64_VHE
arm64: Cope with CPUs stuck in VHE mode
...
Provide support for randomized stack offsets per syscall to make
stack-based attacks harder which rely on the deterministic stack layout.
The feature is based on the original idea of PaX's RANDSTACK feature, but
uses a significantly different implementation.
The offset does not affect the pt_regs location on the task stack as this
was agreed on to be of dubious value. The offset is applied before the
actual syscall is invoked.
The offset is stored per cpu and the randomization happens at the end of
the syscall which is less predictable than on syscall entry.
The mechanism to apply the offset is via alloca(), i.e. abusing the
dispised VLAs. This comes with the drawback that stack-clash-protection
has to be disabled for the affected compilation units and there is also
a negative interaction with stack-protector.
Those downsides are traded with the advantage that this approach does not
require any intrusive changes to the low level assembly entry code, does
not affect the unwinder and the correct stack alignment is handled
automatically by the compiler.
The feature is guarded with a static branch which avoids the overhead when
disabled.
Currently this is supported for X86 and ARM64.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGjz8THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWsvD/4tGnPAurd6lbzxWzRjW7jOOVyzkODM
UXtIxxICaj7o6MNcloaGe1QtJ8+QOCw3yPQfLG/SoWHse5+oUKQRL9dmWVeJyRSt
JZ1pirkKqWrB+OmPbJKUiO3/TsZ2Z/vO41JVgVTL5/HWhOECSDzZsJkuvF/H+qYD
ReDzd7FUNd76pwVOsXq/cxXclRa81/wMNZRVwmyAwFYE2XoPtQyTERTLrfj6aQKF
P0txr9fEjYlPPwYOk1kjBAoJfDltNm48BBL7CGZtRlsqpNpdsJ1MkeGffhodb6F0
pJYQMlQJHXABZb5GF+v93+iASDpRFn0EvPmLkCxQUfZYLOkRsnuEF2S/fsYX/WPo
uin/wQKwLVdeQq9d9BwlZUKEgsQuV7Q0GVN+JnEQerwD6cWTxv4a1RIUH+K/4Wo5
nTeJVRKcs6m7UkGQRm8JbqnUP0vCV+PSiWWB8J9CmjYeCPbkGjt6mBIsmPaDZ9VL
4i+UX5DJayoREF/rspOBcJftUmExize49p9860UI9N6fd7DsDt7Dq9Ai+ADtZa4C
9BPbF4NWzJq8IWLqBi+PpKBAT3JMX9qQi7s9sbrRxpxtew9Keu5qggKZJYumX71V
qgUMk+xB86HZOrtF6F3oY0zxYv3haPvDydsDgqojtqNGk4PdAdgDYJQwMlb8QSly
SwIWPHIfvP4R9w==
=GMlJ
-----END PGP SIGNATURE-----
Merge tag 'x86-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull entry code update from Thomas Gleixner:
"Provide support for randomized stack offsets per syscall to make
stack-based attacks harder which rely on the deterministic stack
layout.
The feature is based on the original idea of PaX's RANDSTACK feature,
but uses a significantly different implementation.
The offset does not affect the pt_regs location on the task stack as
this was agreed on to be of dubious value. The offset is applied
before the actual syscall is invoked.
The offset is stored per cpu and the randomization happens at the end
of the syscall which is less predictable than on syscall entry.
The mechanism to apply the offset is via alloca(), i.e. abusing the
dispised VLAs. This comes with the drawback that
stack-clash-protection has to be disabled for the affected compilation
units and there is also a negative interaction with stack-protector.
Those downsides are traded with the advantage that this approach does
not require any intrusive changes to the low level assembly entry
code, does not affect the unwinder and the correct stack alignment is
handled automatically by the compiler.
The feature is guarded with a static branch which avoids the overhead
when disabled.
Currently this is supported for X86 and ARM64"
* tag 'x86-entry-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arm64: entry: Enable random_kstack_offset support
lkdtm: Add REPORT_STACK for checking stack offsets
x86/entry: Enable random_kstack_offset support
stack: Optionally randomize kernel stack offset each syscall
init_on_alloc: Optimize static branches
jump_label: Provide CONFIG-driven build state defaults
eliminate custom code patching. For that, the alternatives infra is
extended to accomodate paravirt's needs and, as a result, a lot of
paravirt patching code goes away, leading to a sizeable cleanup and
simplification. Work by Juergen Gross.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGiXQACgkQEsHwGGHe
VUocbw/+OkFzphK6zlNA8O3RJ24u2csXUWWUtpGlZ2220Nn/Bgyso2+fyg/NEeQg
EmEttaY3JG/riCDfHk5Xm2saeVtsbPXN4f0sJm/Io/djF7Cm03WS0eS0aA2Rnuca
MhmvvkrzYqZXAYVaxKkIH6sNlPgyXX7vDNPbTd/0ZCOb3ZKIyXwL+SaLatMCtE5o
ou7e8Bj8xPSwcaCyK6sqjrT6jdpPjoTrxxrwENW8AlRu5lCU1pIY03GGhARPVoEm
fWkZsIPn7DxhpyIqzJtEMX8EK1xN96E+NGkNuSAtJGP9HRb+3j5f4s3IUAfXiLXq
r7NecFw8zHhPKl9J0pPCiW7JvMrCMU5xGwyeUmmhKyK2BxwvvAC173ohgMlCfB2Q
FPIsQWemat17tSue8LIA8SmlSDQz6R+tTdUFT+vqmNV34PxOIEeSdV7HG8rs87Ec
dYB9ENUgXqI+h2t7atE68CpTLpWXzNDcq2olEsaEUXenky2hvsi+VxNkWpmlKQ3I
NOMU/AyH8oUzn5O0o3oxdPhDLmK5ItEFxjYjwrgLfKFQ+Y8vIMMq3LrKQGwOj+ZU
n9qC7JjOwDKZGjd3YqNNRhnXp+w0IJvUHbyr3vIAcp8ohQwEKgpUvpZzf/BKUvHh
nJgJSJ53GFJBbVOJMfgVq+JcFr+WO8MDKHaw6zWeCkivFZdSs4g=
=h+km
-----END PGP SIGNATURE-----
Merge tag 'x86_alternatives_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 alternatives/paravirt updates from Borislav Petkov:
"First big cleanup to the paravirt infra to use alternatives and thus
eliminate custom code patching.
For that, the alternatives infrastructure is extended to accomodate
paravirt's needs and, as a result, a lot of paravirt patching code
goes away, leading to a sizeable cleanup and simplification.
Work by Juergen Gross"
* tag 'x86_alternatives_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/paravirt: Have only one paravirt patch function
x86/paravirt: Switch functions with custom code to ALTERNATIVE
x86/paravirt: Add new PVOP_ALT* macros to support pvops in ALTERNATIVEs
x86/paravirt: Switch iret pvops to ALTERNATIVE
x86/paravirt: Simplify paravirt macros
x86/paravirt: Remove no longer needed 32-bit pvops cruft
x86/paravirt: Add new features for paravirt patching
x86/alternative: Use ALTERNATIVE_TERNARY() in _static_cpu_has()
x86/alternative: Support ALTERNATIVE_TERNARY
x86/alternative: Support not-feature
x86/paravirt: Switch time pvops functions to use static_call()
static_call: Add function to query current function
static_call: Move struct static_call_key definition to static_call_types.h
x86/alternative: Merge include files
x86/alternative: Drop unused feature parameter from ALTINSTR_REPLACEMENT()
Displaying two registers per line takes 15 lines. That improves to just
10 lines if we display three registers per line, which reduces the amount
of information lost when oopses are cut off. It stays within 80 columns
and matches x86-64.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210420172245.3679077-1-willy@infradead.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In __apply_alternatives() we take a pointer to void which we later
assign to a pointer to struct alt_region. As all callers are passing a
pointer to struct alt_region to begin with, it's simpler and more robust
to take a pointer to struct alt region, so let's do so and avoid the
need for a temporary variable.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210416163032.10857-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Clang can assemble these files just fine; this is a relic from the top
level Makefile conditionally adding this. We no longer need --prefix,
--gcc-toolchain, or -Qunused-arguments flags either with this change, so
remove those too.
To test building:
$ ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- \
CROSS_COMPILE_COMPAT=arm-linux-gnueabi- make LLVM=1 LLVM_IAS=1 \
defconfig arch/arm64/kernel/vdso32/
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Tested-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210420174427.230228-1-ndesaulniers@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
New features:
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)
Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmCCpuAPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpD2G8QALWQYeBggKnNmAJfuihzZ2WariBmgcENs2R2
qNZ/Py6dIF+b69P68nmgrEV1x2Kp35cPJbBwXnnrS4FCB5tk0b8YMaj00QbiRIYV
UXbPxQTmYO1KbevpoEcw8NmR4bZJ/hRYPuzcQG7CCMKIZw0zj2cMcBofzQpTOAp/
CgItdcv7at3iwamQatfU9vUmC0nDdnjdIwSxTAJOYMVV1ENwtnYSNgZVo4XLTg7n
xR/5Qx27PKBJw7GyTRAIIxKAzNXG2tDL+GVIHe4AnRp3z3La8sr6PJf7nz9MCmco
ISgeY7EGQINzmm4LahpnV+2xwwxOWo8QotxRFGNuRTOBazfARyAbp97yJ6eXJUpa
j0qlg3xK9neyIIn9BQKkKx4sY9V45yqkuVDsK6odmqPq3EE01IMTRh1N/XQi+sTF
iGrlM3ZW4AjlT5zgtT9US/FRXeDKoYuqVCObJeXZdm3sJSwEqTAs0JScnc0YTsh7
m30CODnomfR2y5X6GoaubbQ0wcZ2I20K1qtIm+2F6yzD5P1/3Yi8HbXMxsSWyYWZ
1ldoSa+ZUQlzV9Ot0S3iJ4PkphLKmmO96VlxE2+B5gQG50PZkLzsr8bVyYOuJC8p
T83xT9xd07cy+FcGgF9veZL99Y6BLHMa6ZwFUolYNbzJxqrmqyR1aiJMEBIcX+aP
ACeKW1w5
=fpey
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 5.13
New features:
- Stage-2 isolation for the host kernel when running in protected mode
- Guest SVE support when running in nVHE mode
- Force W^X hypervisor mappings in nVHE mode
- ITS save/restore for guests using direct injection with GICv4.1
- nVHE panics now produce readable backtraces
- Guest support for PTP using the ptp_kvm driver
- Performance improvements in the S2 fault handler
- Alexandru is now a reviewer (not really a new feature...)
Fixes:
- Proper emulation of the GICR_TYPER register
- Handle the complete set of relocation in the nVHE EL2 object
- Get rid of the oprofile dependency in the PMU code (and of the
oprofile body parts at the same time)
- Debug and SPE fixes
- Fix vcpu reset
CONFIG_KASAN_STACK and CONFIG_KASAN_STACK_ENABLE both enable KASAN stack
instrumentation, but we should only need one config, so that we remove
CONFIG_KASAN_STACK_ENABLE and make CONFIG_KASAN_STACK workable. see [1].
When enable KASAN stack instrumentation, then for gcc we could do no
prompt and default value y, and for clang prompt and default value n.
This patch fixes the following compilation warning:
include/linux/kasan.h:333:30: warning: 'CONFIG_KASAN_STACK' is not defined, evaluates to 0 [-Wundef]
[akpm@linux-foundation.org: fix merge snafu]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=210221 [1]
Link: https://lkml.kernel.org/r/20210226012531.29231-1-walter-zh.wu@mediatek.com
Fixes: d9b571c885 ("kasan: fix KASAN_STACK dependency for HW_TAGS")
Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com>
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* for-next/misc:
: Miscellaneous patches
arm64/sve: Add compile time checks for SVE hooks in generic functions
arm64/kernel/probes: Use BUG_ON instead of if condition followed by BUG.
arm64/sve: Remove redundant system_supports_sve() tests
arm64: mte: Remove unused mte_assign_mem_tag_range()
arm64: Add __init section marker to some functions
arm64/sve: Rework SVE access trap to convert state in registers
docs: arm64: Fix a grammar error
arm64: smp: Add missing prototype for some smp.c functions
arm64: setup: name `tcr` register
arm64: setup: name `mair` register
arm64: stacktrace: Move start_backtrace() out of the header
arm64: barrier: Remove spec_bar() macro
arm64: entry: remove test_irqs_unmasked macro
ARM64: enable GENERIC_FIND_FIRST_BIT
arm64: defconfig: Use DEBUG_INFO_REDUCED
* for-next/kselftest:
: Various kselftests for arm64
kselftest: arm64: Add BTI tests
kselftest/arm64: mte: Report filename on failing temp file creation
kselftest/arm64: mte: Fix clang warning
kselftest/arm64: mte: Makefile: Fix clang compilation
kselftest/arm64: mte: Output warning about failing compiler
kselftest/arm64: mte: Use cross-compiler if specified
kselftest/arm64: mte: Fix MTE feature detection
kselftest/arm64: mte: common: Fix write() warnings
kselftest/arm64: mte: user_mem: Fix write() warning
kselftest/arm64: mte: ksm_options: Fix fscanf warning
kselftest/arm64: mte: Fix pthread linking
kselftest/arm64: mte: Fix compilation with native compiler
* for-next/xntable:
: Add hierarchical XN permissions for all page tables
arm64: mm: use XN table mapping attributes for user/kernel mappings
arm64: mm: use XN table mapping attributes for the linear region
arm64: mm: add missing P4D definitions and use them consistently
* for-next/vdso:
: Minor improvements to the compat vdso and sigpage
arm64: compat: Poison the compat sigpage
arm64: vdso: Avoid ISB after reading from cntvct_el0
arm64: compat: Allow signal page to be remapped
arm64: vdso: Remove redundant calls to flush_dcache_page()
arm64: vdso: Use GFP_KERNEL for allocating compat vdso and signal pages
* for-next/fiq:
: Support arm64 FIQ controller registration
arm64: irq: allow FIQs to be handled
arm64: Always keep DAIF.[IF] in sync
arm64: entry: factor irq triage logic into macros
arm64: irq: rework root IRQ handler registration
arm64: don't use GENERIC_IRQ_MULTI_HANDLER
genirq: Allow architectures to override set_handle_irq() fallback
* for-next/epan:
: Support for Enhanced PAN (execute-only permissions)
arm64: Support execute-only permissions with Enhanced PAN
* for-next/kasan-vmalloc:
: Support CONFIG_KASAN_VMALLOC on arm64
arm64: Kconfig: select KASAN_VMALLOC if KANSAN_GENERIC is enabled
arm64: kaslr: support randomized module area with KASAN_VMALLOC
arm64: Kconfig: support CONFIG_KASAN_VMALLOC
arm64: kasan: abstract _text and _end to KERNEL_START/END
arm64: kasan: don't populate vmalloc area for CONFIG_KASAN_VMALLOC
* for-next/fgt-boot-init:
: Booting clarifications and fine grained traps setup
arm64: Require that system registers at all visible ELs be initialized
arm64: Disable fine grained traps on boot
arm64: Document requirements for fine grained traps at boot
* for-next/vhe-only:
: Dealing with VHE-only CPUs (a.k.a. M1)
arm64: Get rid of CONFIG_ARM64_VHE
arm64: Cope with CPUs stuck in VHE mode
arm64: cpufeature: Allow early filtering of feature override
* arm64/for-next/perf:
arm64: perf: Remove redundant initialization in perf_event.c
perf/arm_pmu_platform: Clean up with dev_printk
perf/arm_pmu_platform: Fix error handling
perf/arm_pmu_platform: Use dev_err_probe() for IRQ errors
docs: perf: Address some html build warnings
docs: perf: Add new description on HiSilicon uncore PMU v2
drivers/perf: hisi: Add support for HiSilicon PA PMU driver
drivers/perf: hisi: Add support for HiSilicon SLLC PMU driver
drivers/perf: hisi: Update DDRC PMU for programmable counter
drivers/perf: hisi: Add new functions for HHA PMU
drivers/perf: hisi: Add new functions for L3C PMU
drivers/perf: hisi: Add PMU version for uncore PMU drivers.
drivers/perf: hisi: Refactor code for more uncore PMUs
drivers/perf: hisi: Remove unnecessary check of counter index
drivers/perf: Simplify the SMMUv3 PMU event attributes
drivers/perf: convert sysfs sprintf family to sysfs_emit
drivers/perf: convert sysfs scnprintf family to sysfs_emit_at() and sysfs_emit()
drivers/perf: convert sysfs snprintf family to sysfs_emit
* for-next/neon-softirqs-disabled:
: Run kernel mode SIMD with softirqs disabled
arm64: fpsimd: run kernel mode NEON with softirqs disabled
arm64: assembler: introduce wxN aliases for wN registers
arm64: assembler: remove conditional NEON yield macros
The FPSIMD code was relying on IS_ENABLED() checks in system_suppors_sve()
to cause the compiler to delete references to SVE functions in some places,
add explicit IS_ENABLED() checks back.
Fixes: ef9c5d0979 ("arm64/sve: Remove redundant system_supports_sve() tests")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210415121742.36628-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Fix incorrect asm constraint for load_unaligned_zeropad() fixup
- Fix thread flag update when setting TIF_MTE_ASYNC_FAULT
- Fix restored irq state when handling fault on kprobe
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmB2sKMQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNEViB/47lQncMY12/AFYk16Dn3qgY7wvrZNLEPoz
fCUr05So46OyRhzIfOhakCPRQzXYapMTWJzPDL6Ok9VmdawSf1UA80oNs02UAQsA
63j4/jEXC+rICS42mbfbwNgNA7BNx9Ek0AR778iqQglwrVEVhMF5M1epgaGOairG
4HH2NMzrP6P60gRwGVOsGrmpDfXMMuX45DE1ca8EKN5TnDGTfczrZ8sw8MrVTSDe
cT5H++qOdH8NalqlOeOB9XcHoL2Af4OG4LJQOlg3UZztQm6jnBrNe/VU9IgLPQl4
kb+HP6qTtMAVOMHn6w6glK93rXaK2PCOOfL3yFyZfHc1Q1xAMU4S
=3J2L
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix incorrect asm constraint for load_unaligned_zeropad() fixup
- Fix thread flag update when setting TIF_MTE_ASYNC_FAULT
- Fix restored irq state when handling fault on kprobe
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kprobes: Restore local irqflag if kprobes is cancelled
arm64: mte: Ensure TIF_MTE_ASYNC_FAULT is set atomically
arm64: fix inline asm in load_unaligned_zeropad()
The kernel does not use any keys besides IA so we don't need to
install IB/DA/DB/GA on kernel exit if we arrange to install them
on task switch instead, which we can expect to happen an order of
magnitude less often.
Furthermore we can avoid installing the user IA in the case where the
user task has IA disabled and just leave the kernel IA installed. This
also lets us avoid needing to install IA on kernel entry.
On an Apple M1 under a hypervisor, the overhead of kernel entry/exit
has been measured to be reduced by 15.6ns in the case where IA is
enabled, and 31.9ns in the case where IA is disabled.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Ieddf6b580d23c9e0bed45a822dabe72d2ffc9a8e
Link: https://lore.kernel.org/r/2d653d055f38f779937f2b92f8ddd5cf9e4af4f4.1616123271.git.pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This change introduces a prctl that allows the user program to control
which PAC keys are enabled in a particular task. The main reason
why this is useful is to enable a userspace ABI that uses PAC to
sign and authenticate function pointers and other pointers exposed
outside of the function, while still allowing binaries conforming
to the ABI to interoperate with legacy binaries that do not sign or
authenticate pointers.
The idea is that a dynamic loader or early startup code would issue
this prctl very early after establishing that a process may load legacy
binaries, but before executing any PAC instructions.
This change adds a small amount of overhead to kernel entry and exit
due to additional required instruction sequences.
On a DragonBoard 845c (Cortex-A75) with the powersave governor, the
overhead of similar instruction sequences was measured as 4.9ns when
simulating the common case where IA is left enabled, or 43.7ns when
simulating the uncommon case where IA is disabled. These numbers can
be seen as the worst case scenario, since in more realistic scenarios
a better performing governor would be used and a newer chip would be
used that would support PAC unlike Cortex-A75 and would be expected
to be faster than Cortex-A75.
On an Apple M1 under a hypervisor, the overhead of the entry/exit
instruction sequences introduced by this patch was measured as 0.3ns
in the case where IA is left enabled, and 33.0ns in the case where
IA is disabled.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Link: https://linux-review.googlesource.com/id/Ibc41a5e6a76b275efbaa126b31119dc197b927a5
Link: https://lore.kernel.org/r/d6609065f8f40397a4124654eb68c9f490b4d477.1616123271.git.pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently there are a number of places in the SVE code where we check both
system_supports_sve() and TIF_SVE. This is a bit redundant given that we
should never get into a situation where we have set TIF_SVE without having
SVE support and it is not clear that silently ignoring a mistakenly set
TIF_SVE flag is the most sensible error handling approach. For now let's
just drop the system_supports_sve() checks since this will at least reduce
overhead a little.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210412172320.3315-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If instruction being single stepped caused a page fault, the kprobes
is cancelled to let the page fault handler continue as a normal page
fault. But the local irqflags are disabled so cpu will restore pstate
with DAIF masked. After pagefault is serviced, the kprobes is
triggerred again, we overwrite the saved_irqflag by calling
kprobes_save_local_irqflag(). NOTE, DAIF is masked in this new saved
irqflag. After kprobes is serviced, the cpu pstate is retored with
DAIF masked.
This patch is inspired by one patch for riscv from Liao Chang.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20210412174101.6bfb0594@xhacker.debian
Signed-off-by: Will Deacon <will@kernel.org>
Pull ARM cpufreq updates for v5.13 from Viresh Kumar:
"- Fix typos in s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from
1000 MHz (Pali Rohár and Marek Behún).
- cpufreq-dt: Return -EPROBE_DEFER on failure to add table (Quanyang
Wang).
- Minor cleanup in cppc driver (Tom Saeger).
- Add frequency invariance support for CPPC driver and generalize
freq invariance support arch-topology driver (Viresh Kumar)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: armada-37xx: Fix module unloading
cpufreq: armada-37xx: Remove cur_frequency variable
cpufreq: armada-37xx: Fix determining base CPU frequency
cpufreq: armada-37xx: Fix driver cleanup when registration failed
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
cpufreq: armada-37xx: Fix setting TBG parent for load levels
cpufreq: dt: dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER
cpufreq: cppc: simplify default delay_us setting
cpufreq: Rudimentary typos fix in the file s5pv210-cpufreq.c
cpufreq: CPPC: Add support for frequency invariance
arch_topology: Export arch_freq_scale and helpers
arch_topology: Allow multiple entities to provide sched_freq_tick() callback
arch_topology: Rename freq_scale as arch_freq_scale
The entry from EL0 code checks the TFSRE0_EL1 register for any
asynchronous tag check faults in user space and sets the
TIF_MTE_ASYNC_FAULT flag. This is not done atomically, potentially
racing with another CPU calling set_tsk_thread_flag().
Replace the non-atomic ORR+STR with an STSET instruction. While STSET
requires ARMv8.1 and an assembler that understands LSE atomics, the MTE
feature is part of ARMv8.5 and already requires an updated assembler.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 637ec831ea ("arm64: mte: Handle synchronous and asynchronous tag check faults")
Cc: <stable@vger.kernel.org> # 5.10.x
Reported-by: Will Deacon <will@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210409173710.18582-1-catalin.marinas@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Kernel mode NEON can be used in task or softirq context, but only in
a non-nesting manner, i.e., softirq context is only permitted if the
interrupt was not taken at a point where the kernel was using the NEON
in task context.
This means all users of kernel mode NEON have to be aware of this
limitation, and either need to provide scalar fallbacks that may be much
slower (up to 20x for AES instructions) and potentially less safe, or
use an asynchronous interface that defers processing to a later time
when the NEON is guaranteed to be available.
Given that grabbing and releasing the NEON is cheap, we can relax this
restriction, by increasing the granularity of kernel mode NEON code, and
always disabling softirq processing while the NEON is being used in task
context.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210302090118.30666-4-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When MTE async mode is enabled TFSR_EL1 contains the accumulative
asynchronous tag check faults for EL1 and EL0.
During the suspend/resume operations the firmware might perform some
operations that could change the state of the register resulting in
a spurious tag check fault report.
Report asynchronous tag faults before suspend and clear the TFSR_EL1
register after resume to prevent this to happen.
Cc: Will Deacon <will@kernel.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210315132019.33202-9-vincenzo.frascino@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
MTE provides a mode that asynchronously updates the TFSR_EL1 register
when a tag check exception is detected.
To take advantage of this mode the kernel has to verify the status of
the register at:
1. Context switching
2. Return to user/EL0 (Not required in entry from EL0 since the kernel
did not run)
3. Kernel entry from EL1
4. Kernel exit to EL1
If the register is non-zero a trace is reported.
Add the required features for EL1 detection and reporting.
Note: ITFSB bit is set in the SCTLR_EL1 register hence it guaranties that
the indirect writes to TFSR_EL1 are synchronized at exception entry to
EL1. On the context switch path the synchronization is guarantied by the
dsb() in __switch_to().
The dsb(nsh) in mte_check_tfsr_exit() is provisional pending
confirmation by the architects.
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210315132019.33202-8-vincenzo.frascino@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
mte_enable_kernel_*() are not needed if KASAN_HW is disabled.
Add ash defines around the functions to conditionally compile the
functions.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210315132019.33202-7-vincenzo.frascino@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
load_unaligned_zeropad() and __get/put_kernel_nofault() functions can
read past some buffer limits which may include some MTE granule with a
different tag.
When MTE async mode is enabled, the load operation crosses the boundaries
and the next granule has a different tag the PE sets the TFSR_EL1.TF1 bit
as if an asynchronous tag fault is happened.
Enable Tag Check Override (TCO) in these functions before the load and
disable it afterwards to prevent this to happen.
Note: The same condition can be hit in MTE sync mode but we deal with it
through the exception handling.
In the current implementation, mte_async_mode flag is set only at boot
time but in future kasan might acquire some runtime features that
that change the mode dynamically, hence we disable it when sync mode is
selected for future proof.
Cc: Will Deacon <will@kernel.org>
Reported-by: Branislav Rankov <Branislav.Rankov@arm.com>
Tested-by: Branislav Rankov <Branislav.Rankov@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210315132019.33202-6-vincenzo.frascino@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
MTE provides an asynchronous mode for detecting tag exceptions. In
particular instead of triggering a fault the arm64 core updates a
register which is checked by the kernel after the asynchronous tag
check fault has occurred.
Add support for MTE asynchronous mode.
The exception handling mechanism will be added with a future patch.
Note: KASAN HW activates async mode via kasan.mode kernel parameter.
The default mode is set to synchronous.
The code that verifies the status of TFSR_EL1 will be added with a
future patch.
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210315132019.33202-2-vincenzo.frascino@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
With CONFIG_CFI_CLANG, the compiler replaces function pointers with
jump table addresses, which breaks dynamic ftrace as the address of
ftrace_call is replaced with the address of ftrace_call.cfi_jt. Use
function_nocfi() to get the address of the actual function instead.
Suggested-by: Ben Dai <ben.dai@unisoc.com>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-17-samitolvanen@google.com
__apply_alternatives makes indirect calls to functions whose address
is taken in assembly code using the alternative_cb macro. With
non-canonical CFI, the compiler won't replace these function
references with the jump table addresses, which trips CFI. Disable CFI
checking in the function to work around the issue.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-16-samitolvanen@google.com
Disable CFI checking for functions that switch to linear mapping and
make an indirect call to a physical address, since the compiler only
understands virtual addresses and the CFI check for such indirect calls
would always fail.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-15-samitolvanen@google.com
With CONFIG_CFI_CLANG, the compiler replaces function address
references with the address of the function's CFI jump table
entry. This means that __pa_symbol(function) returns the physical
address of the jump table entry, which can lead to address space
confusion as the jump table points to the function's virtual
address. Therefore, use the function_nocfi() macro to ensure we are
always taking the address of the actual function instead.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210408182843.1754385-14-samitolvanen@google.com
CONFIG_ARM64_VHE was introduced with ARMv8.1 (some 7 years ago),
and has been enabled by default for almost all that time.
Given that newer systems that are VHE capable are finally becoming
available, and that some systems are even incapable of not running VHE,
drop the configuration altogether.
Anyone willing to stick to non-VHE on VHE hardware for obscure
reasons should use the 'kvm-arm.mode=nvhe' command-line option.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210408131010.1109027-4-maz@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
It seems that the CPUs part of the SoC known as Apple M1 have the
terrible habit of being stuck with HCR_EL2.E2H==1, in violation
of the architecture.
Try and work around this deplorable state of affairs by detecting
the stuck bit early and short-circuit the nVHE dance. Additional
filtering code ensures that attempts at switching to nVHE from
the command-line are also ignored.
It is still unknown whether there are many more such nuggets
to be found...
Reported-by: Hector Martin <marcan@marcan.st>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210408131010.1109027-3-maz@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Some CPUs are broken enough that some overrides need to be rejected
at the earliest opportunity. In some cases, that's right at cpu
feature override time.
Provide the necessary infrastructure to filter out overrides,
and to report such filtered out overrides to the core cpufeature code.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210408131010.1109027-2-maz@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
They are not needed after booting, so mark them as __init to move them
to the .init section.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Link: https://lore.kernel.org/r/20210330135449.4dcffd7f@xhacker.debian
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
When we enable SVE usage in userspace after taking a SVE access trap we
need to ensure that the portions of the register state that are not
shared with the FPSIMD registers are zeroed. Currently we do this by
forcing the FPSIMD registers to be saved to the task struct and converting
them there. This is wasteful in the common case where the task state is
loaded into the registers and we will immediately return to userspace
since we can initialise the SVE state directly in registers instead of
accessing multiple copies of the register state in memory.
Instead in that common case do the conversion in the registers and
update the task metadata so that we can return to userspace without
spilling the register state to memory unless there is some other reason
to do so.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210312190313.24598-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Allow for a randomized stack offset on a per-syscall basis, with roughly
5 bits of entropy. (And include AAPCS rationale AAPCS thanks to Mark
Rutland.)
In order to avoid unconditional stack canaries on syscall entry (due to
the use of alloca()), also disable stack protector to avoid triggering
needless checks and slowing down the entry path. As there is no general
way to control stack protector coverage with a function attribute[1],
this must be disabled at the compilation unit level. This isn't a problem
here, though, since stack protector was not triggered before: examining
the resulting syscall.o, there are no changes in canary coverage (none
before, none now).
[1] a working __attribute__((no_stack_protector)) has been added to GCC
and Clang but has not been released in any version yet:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=346b302d09c1e6db56d9fe69048acb32fbb97845https://reviews.llvm.org/rG4fbf84c1732fca596ad1d6e96015e19760eb8a9b
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210401232347.2791257-6-keescook@chromium.org
For a nvhe host, the EL2 must allow the EL1&0 translation
regime for TraceBuffer (MDCR_EL2.E2TB == 0b11). This must
be saved/restored over a trip to the guest. Also, before
entering the guest, we must flush any trace data if the
TRBE was enabled. And we must prohibit the generation
of trace while we are in EL1 by clearing the TRFCR_EL1.
For vhe, the EL2 must prevent the EL1 access to the Trace
Buffer.
The MDCR_EL2 bit definitions for TRBE are available here :
https://developer.arm.com/documentation/ddi0601/2020-12/AArch64-Registers/
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210405164307.1720226-8-suzuki.poulose@arm.com
Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
The initialization of value in function armv8pmu_read_hw_counter()
and armv8pmu_read_counter() seem redundant, as they are soon updated.
So, We can remove them.
Signed-off-by: Qi Liu <liuqi115@huawei.com>
Link: https://lore.kernel.org/r/1617275801-1980-1-git-send-email-liuqi115@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
To aid with debugging, add details of the source of a panic from nVHE
hyp. This is done by having nVHE hyp exit to nvhe_hyp_panic_handler()
rather than directly to panic(). The handler will then add the extra
details for debugging before panicking the kernel.
If the panic was due to a BUG(), look up the metadata to log the file
and line, if available, otherwise log an address that can be looked up
in vmlinux. The hyp offset is also logged to allow other hyp VAs to be
converted, similar to how the kernel offset is logged during a panic.
__hyp_panic_string is now inlined since it no longer needs to be
referenced as a symbol and the message is free to diverge between VHE
and nVHE.
The following is an example of the logs generated by a BUG in nVHE hyp.
[ 46.754840] kvm [307]: nVHE hyp BUG at: arch/arm64/kvm/hyp/nvhe/switch.c:242!
[ 46.755357] kvm [307]: Hyp Offset: 0xfffea6c58e1e0000
[ 46.755824] Kernel panic - not syncing: HYP panic:
[ 46.755824] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.755824] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.755824] VCPU:0000d93a880d0000
[ 46.756960] CPU: 3 PID: 307 Comm: kvm-vcpu-0 Not tainted 5.12.0-rc3-00005-gc572b99cf65b-dirty #133
[ 46.757459] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 46.758366] Call trace:
[ 46.758601] dump_backtrace+0x0/0x1b0
[ 46.758856] show_stack+0x18/0x70
[ 46.759057] dump_stack+0xd0/0x12c
[ 46.759236] panic+0x16c/0x334
[ 46.759426] arm64_kernel_unmapped_at_el0+0x0/0x30
[ 46.759661] kvm_arch_vcpu_ioctl_run+0x134/0x750
[ 46.759936] kvm_vcpu_ioctl+0x2f0/0x970
[ 46.760156] __arm64_sys_ioctl+0xa8/0xec
[ 46.760379] el0_svc_common.constprop.0+0x60/0x120
[ 46.760627] do_el0_svc+0x24/0x90
[ 46.760766] el0_svc+0x2c/0x54
[ 46.760915] el0_sync_handler+0x1a4/0x1b0
[ 46.761146] el0_sync+0x170/0x180
[ 46.761889] SMP: stopping secondary CPUs
[ 46.762786] Kernel Offset: 0x3e1cd2820000 from 0xffff800010000000
[ 46.763142] PHYS_OFFSET: 0xffffa9f680000000
[ 46.763359] CPU features: 0x00240022,61806008
[ 46.763651] Memory Limit: none
[ 46.813867] ---[ end Kernel panic - not syncing: HYP panic:
[ 46.813867] PS:400003c9 PC:0000d93a82c705ac ESR:f2000800
[ 46.813867] FAR:0000000080080000 HPFAR:0000000000800800 PAR:0000000000000000
[ 46.813867] VCPU:0000d93a880d0000 ]---
Signed-off-by: Andrew Scull <ascull@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210318143311.839894-6-ascull@google.com
After KASAN_VMALLOC works in arm64, we can randomize module region
into vmalloc area now.
Test:
VMALLOC area ffffffc010000000 fffffffdf0000000
before the patch:
module_alloc_base/end ffffffc008b80000 ffffffc010000000
after the patch:
module_alloc_base/end ffffffdcf4bed000 ffffffc010000000
And the function that insmod some modules is fine.
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Link: https://lore.kernel.org/r/20210324040522.15548-5-lecopzer.chen@mediatek.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently start_backtrace() is a static inline function in the header.
Since it really shouldn't be sufficiently performance critical that we
actually need to have it inlined move it into a C file, this will save
anyone else scratching their head about why it is defined in the header.
As far as I can see it's only there because it was factored out of the
various callers.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210319174022.33051-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Enhanced Privileged Access Never (EPAN) allows Privileged Access Never
to be used with Execute-only mappings.
Absence of such support was a reason for 24cecc3774 ("arm64: Revert
support for execute-only user mappings"). Thus now it can be revisited
and re-enabled.
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210312173811.58284-2-vladimir.murzin@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Fix possible memory hotplug failure with KASLR
- Fix FFR value in SVE kselftest
- Fix backtraces reported in /proc/$pid/stack
- Disable broken CnP implementation on NVIDIA Carmel
- Typo fixes and ACPI documentation clarification
- Fix some W=1 warnings
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmBccr0QHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNG6UCACDbz3BO/y40wRhWwMhvDhyFDqtlTlVEQlb
hxnJzksXOlbqHB1J7yamzXxS1UlCBlhvjrFNTe1s5LJIfB0niMskYLe2p0dJ/voi
WyysvaiK7/1bZV/RRdF7r+hFtMPHBEAKfgs+ZxFN9mnMcserV8PWqiD5ookCqavE
xatE/fEgVujiISl/BOkP1pnmWnPM4f9BIMS5DgaZJsNDYtxeu9a3RGnfu9vNHaP2
gxq5+E3BjZfh1z0++HP6nTuDbdDaxEz12gyoZ+4wejXVhwj1g7NySJNa8RmJG9pU
gX+jE6HOgeCFIEe9Gx+I2QtAaFia96HVnAAHagGBHB1vfV7GTRxN
=tzbO
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Minor fixes all over, ranging from typos to tests to errata
workarounds:
- Fix possible memory hotplug failure with KASLR
- Fix FFR value in SVE kselftest
- Fix backtraces reported in /proc/$pid/stack
- Disable broken CnP implementation on NVIDIA Carmel
- Typo fixes and ACPI documentation clarification
- Fix some W=1 warnings"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kernel: disable CNP on Carmel
arm64/process.c: fix Wmissing-prototypes build warnings
kselftest/arm64: sve: Do not use non-canonical FFR register value
arm64: mm: correct the inside linear map range during hotplug check
arm64: kdump: update ppos when reading elfcorehdr
arm64: cpuinfo: Fix a typo
Documentation: arm64/acpi : clarify arm64 support of IBFT
arm64: stacktrace: don't trace arch_stack_walk()
arm64: csum: cast to the proper type
We haven't needed the test_irqs_unmasked macro since commit:
105fc33520 ("arm64: entry: move el1 irq/nmi logic to C")
... and as we convert more of the entry logic to C it is decreasingly
likely we'll need it in future, so let's remove the unused macro.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210323181201.18889-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the read_ctr macro has been specialised for nVHE,
the whole CPU_FTR_REG_HYP_COPY infrastrcture looks completely
overengineered.
Simplify it by populating the two u64 quantities (MMFR0 and 1)
that the hypervisor need.
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
In protected mode, late CPUs are not allowed to boot (enforced by
the PSCI relay). We can thus specialise the read_ctr macro to
always return a pre-computed, sanitised value. Special care is
taken to prevent the use of this custome version outside of
the protected mode.
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
On NVIDIA Carmel cores, CNP behaves differently than it does on standard
ARM cores. On Carmel, if two cores have CNP enabled and share an L2 TLB
entry created by core0 for a specific ASID, a non-shareable TLBI from
core1 may still see the shared entry. On standard ARM cores, that TLBI
will invalidate the shared entry as well.
This causes issues with patchsets that attempt to do local TLBIs based
on cpumasks instead of broadcast TLBIs. Avoid these issues by disabling
CNP support for NVIDIA Carmel cores.
Signed-off-by: Rich Wiley <rwiley@nvidia.com>
Link: https://lore.kernel.org/r/20210324002809.30271-1-rwiley@nvidia.com
[will: Fix pre-existing whitespace issue]
Signed-off-by: Will Deacon <will@kernel.org>
On contemporary platforms we don't use FIQ, and treat any stray FIQ as a
fatal event. However, some platforms have an interrupt controller wired
to FIQ, and need to handle FIQ as part of regular operation.
So that we can support both cases dynamically, this patch updates the
FIQ exception handling code to operate the same way as the IRQ handling
code, with its own handle_arch_fiq handler. Where a root FIQ handler is
not registered, an unexpected FIQ exception will trigger the default FIQ
handler, which will panic() as today. Where a root FIQ handler is
registered, handling of the FIQ is deferred to that handler.
As el0_fiq_invalid_compat is supplanted by el0_fiq, the former is
removed. For !CONFIG_COMPAT builds we never expect to take an exception
from AArch32 EL0, so we keep the common el0_fiq_invalid handler.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210315115629.57191-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Apple SoCs (A11 and newer) have some interrupt sources hardwired to the
FIQ line. We implement support for this by simply treating IRQs and FIQs
the same way in the interrupt vectors.
To support these systems, the FIQ mask bit needs to be kept in sync with
the IRQ mask bit, so both kinds of exceptions are masked together. No
other platforms should be delivering FIQ exceptions right now, and we
already unmask FIQ in normal process context, so this should not have an
effect on other systems - if spurious FIQs were arriving, they would
already panic the kernel.
Signed-off-by: Hector Martin <marcan@marcan.st>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210315115629.57191-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In subsequent patches we'll allow an FIQ handler to be registered, and
FIQ exceptions will need to be triaged very similarly to IRQ exceptions.
So that we can reuse the existing logic, this patch factors the IRQ
triage logic out into macros that can be reused for FIQ.
The macros are named to follow the elX_foo_handler scheme used by the C
exception handlers. For consistency with other top-level exception
handlers, the kernel_entry/kernel_exit logic is not moved into the
macros. As FIQ will use a different C handler, this handler name is
provided as an argument to the macros.
There should be no functional change as a result of this patch.
Signed-off-by: Marc Zyngier <maz@kernel.org>
[Mark: rework macros, commit message, rebase before DAIF rework]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: James Morse <james.morse@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210315115629.57191-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If we accidentally unmask IRQs before we've registered a root IRQ
handler, handle_arch_irq will be NULL, and the IRQ exception handler
will branch to a bogus address.
To make this easier to debug, this patch initialises handle_arch_irq to
a default handler which will panic(), making such problems easier to
debug. When we add support for FIQ handlers, we can follow the same
approach.
When we add support for a root FIQ handler, it's possible to have root
IRQ handler without an root FIQ handler, and in theory the inverse is
also possible. To permit this, and to keep the IRQ/FIQ registration
logic similar, this patch removes the panic in the absence of a root IRQ
controller. Instead, set_handle_irq() logs when a handler is registered,
which is sufficient for debug purposes.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: James Morse <james.morse@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210315115629.57191-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In subsequent patches we want to allow irqchip drivers to register as
FIQ handlers, with a set_handle_fiq() function. To keep the IRQ/FIQ
paths similar, we want arm64 to provide both set_handle_irq() and
set_handle_fiq(), rather than using GENERIC_IRQ_MULTI_HANDLER for the
former.
This patch adds an arm64-specific implementation of set_handle_irq().
There should be no functional change as a result of this patch.
Signed-off-by: Marc Zyngier <maz@kernel.org>
[Mark: use a single handler pointer]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Hector Martin <marcan@marcan.st>
Cc: James Morse <james.morse@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210315115629.57191-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently we advertise the ID_AA6DFR0_EL1.TRACEVER for the guest,
when the trace register accesses are trapped (CPTR_EL2.TTA == 1).
So, the guest will get an undefined instruction, if trusts the
ID registers and access one of the trace registers.
Lets be nice to the guest and hide the feature to avoid
unexpected behavior.
Even though this can be done at KVM sysreg emulation layer,
we do this by removing the TRACEVER from the sanitised feature
register field. This is fine as long as the ETM drivers
can handle the individual trace units separately, even
when there are differences among the CPUs.
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210323120647.454211-2-suzuki.poulose@arm.com
Commit 9c698bff66 ("ARM: ensure the signal page contains defined contents")
poisoned the unused portions of the signal page for 32-bit Arm.
Implement the same poisoning for the compat signal page on arm64 rather
than using __GFP_ZERO.
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210318170738.7756-6-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For compatability with 32-bit Arm, allow the compat signal page to be
remapped via mremap().
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210318170738.7756-4-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
flush_dcache_page() ensures that the 'PG_dcache_clean' flag for its
'page' argument is clear so that cache maintenance will be performed
if the page is mapped into userspace with execute permissions.
Newly allocated pages have this flag clear, so there is no need to call
flush_dcache_page() for the compat vdso or signal pages. Remove the
redundant calls.
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210318170738.7756-3-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
There's no need to allocate the compat vDSO and signal pages using
GFP_ATOMIC allocations, so use GFP_KERNEL instead.
Signed-off-by: Will Deacon <will@kernel.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210318170738.7756-2-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The ppos points to a position in the old kernel memory (and in case of
arm64 in the crash kernel since elfcorehdr is passed as a segment). The
function should update the ppos by the amount that was read. This bug is
not exposed by accident, but other platforms update this value properly.
So, fix it in ARM64 version of elfcorehdr_read() as well.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Fixes: e62aaeac42 ("arm64: kdump: provide /proc/vmcore file")
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Link: https://lore.kernel.org/r/20210319205054.743368-1-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
We recently converted arm64 to use arch_stack_walk() in commit:
5fc57df2f6 ("arm64: stacktrace: Convert to ARCH_STACKWALK")
The core stacktrace code expects that (when tracing the current task)
arch_stack_walk() starts a trace at its caller, and does not include
itself in the trace. However, arm64's arch_stack_walk() includes itself,
and so traces include one more entry than callers expect. The core
stacktrace code which calls arch_stack_walk() tries to skip a number of
entries to prevent itself appearing in a trace, and the additional entry
prevents skipping one of the core stacktrace functions, leaving this in
the trace unexpectedly.
We can fix this by having arm64's arch_stack_walk() begin the trace with
its caller. The first value returned by the trace will be
__builtin_return_address(0), i.e. the caller of arch_stack_walk(). The
first frame record to be unwound will be __builtin_frame_address(1),
i.e. the caller's frame record. To prevent surprises, arch_stack_walk()
is also marked noinline.
While __builtin_frame_address(1) is not safe in portable code, local GCC
developers have confirmed that it is safe on arm64. To find the caller's
frame record, the builtin can safely dereference the current function's
frame record or (in theory) could stash the original FP into another GPR
at function entry time, neither of which are problematic.
Prior to this patch, the tracing code would unexpectedly show up in
traces of the current task, e.g.
| # cat /proc/self/stack
| [<0>] stack_trace_save_tsk+0x98/0x100
| [<0>] proc_pid_stack+0xb4/0x130
| [<0>] proc_single_show+0x60/0x110
| [<0>] seq_read_iter+0x230/0x4d0
| [<0>] seq_read+0xdc/0x130
| [<0>] vfs_read+0xac/0x1e0
| [<0>] ksys_read+0x6c/0xfc
| [<0>] __arm64_sys_read+0x20/0x30
| [<0>] el0_svc_common.constprop.0+0x60/0x120
| [<0>] do_el0_svc+0x24/0x90
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0_sync_handler+0x1a4/0x1b0
| [<0>] el0_sync+0x170/0x180
After this patch, the tracing code will not show up in such traces:
| # cat /proc/self/stack
| [<0>] proc_pid_stack+0xb4/0x130
| [<0>] proc_single_show+0x60/0x110
| [<0>] seq_read_iter+0x230/0x4d0
| [<0>] seq_read+0xdc/0x130
| [<0>] vfs_read+0xac/0x1e0
| [<0>] ksys_read+0x6c/0xfc
| [<0>] __arm64_sys_read+0x20/0x30
| [<0>] el0_svc_common.constprop.0+0x60/0x120
| [<0>] do_el0_svc+0x24/0x90
| [<0>] el0_svc+0x2c/0x54
| [<0>] el0_sync_handler+0x1a4/0x1b0
| [<0>] el0_sync+0x170/0x180
Erring on the side of caution, I've given this a spin with a bunch of
toolchains, verifying the output of /proc/self/stack and checking that
the assembly looked sound. For GCC (where we require version 5.1.0 or
later) I tested with the kernel.org crosstool binares for versions
5.5.0, 6.4.0, 6.5.0, 7.3.0, 7.5.0, 8.1.0, 8.3.0, 8.4.0, 9.2.0, and
10.1.0. For clang (where we require version 10.0.1 or later) I tested
with the llvm.org binary releases of 11.0.0, and 11.0.1.
Fixes: 5fc57df2f6 ("arm64: stacktrace: Convert to ARCH_STACKWALK")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Jun <chenjun102@huawei.com>
Cc: Marco Elver <elver@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org> # 5.10.x
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210319184106.5688-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
We will soon unmap the .hyp sections from the host stage 2 in Protected
nVHE mode, which obviously works with at least page granularity, so make
sure to align them correctly.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-37-qperret@google.com
When KVM runs in protected nVHE mode, make use of a stage 2 page-table
to give the hypervisor some control over the host memory accesses. The
host stage 2 is created lazily using large block mappings if possible,
and will default to page mappings in absence of a better solution.
>From this point on, memory accesses from the host to protected memory
regions (e.g. not 'owned' by the host) are fatal and lead to hyp_panic().
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-36-qperret@google.com
Move the registers relevant to host stage 2 enablement to
kvm_nvhe_init_params to prepare the ground for enabling it in later
patches.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-22-qperret@google.com
When memory protection is enabled, the EL2 code needs the ability to
create and manage its own page-table. To do so, introduce a new set of
hypercalls to bootstrap a memory management system at EL2.
This leads to the following boot flow in nVHE Protected mode:
1. the host allocates memory for the hypervisor very early on, using
the memblock API;
2. the host creates a set of stage 1 page-table for EL2, installs the
EL2 vectors, and issues the __pkvm_init hypercall;
3. during __pkvm_init, the hypervisor re-creates its stage 1 page-table
and stores it in the memory pool provided by the host;
4. the hypervisor then extends its stage 1 mappings to include a
vmemmap in the EL2 VA space, hence allowing to use the buddy
allocator introduced in a previous patch;
5. the hypervisor jumps back in the idmap page, switches from the
host-provided page-table to the new one, and wraps up its
initialization by enabling the new allocator, before returning to
the host.
6. the host can free the now unused page-table created for EL2, and
will now need to issue hypercalls to make changes to the EL2 stage 1
mappings instead of modifying them directly.
Note that for the sake of simplifying the review, this patch focuses on
the hypervisor side of things. In other words, this only implements the
new hypercalls, but does not make use of them from the host yet. The
host-side changes will follow in a subsequent patch.
Credits to Will for __pkvm_init_switch_pgd.
Acked-by: Will Deacon <will@kernel.org>
Co-authored-by: Will Deacon <will@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-18-qperret@google.com
Introduce the infrastructure in KVM enabling to copy CPU feature
registers into EL2-owned data-structures, to allow reading sanitised
values directly at EL2 in nVHE.
Given that only a subset of these features are being read by the
hypervisor, the ones that need to be copied are to be listed under
<asm/kvm_cpufeature.h> together with the name of the nVHE variable that
will hold the copy. This introduces only the infrastructure enabling
this copy. The first users will follow shortly.
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-14-qperret@google.com
Currently, the hyp code cannot make full use of a bss, as the kernel
section is mapped read-only.
While this mapping could simply be changed to read-write, it would
intermingle even more the hyp and kernel state than they currently are.
Instead, introduce a __hyp_bss section, that uses reserved pages, and
create the appropriate RW hyp mappings during KVM init.
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-8-qperret@google.com
Pull clear_page(), copy_page(), memcpy() and memset() into the nVHE hyp
code and ensure that we always execute the '__pi_' entry point on the
offchance that it changes in future.
[ qperret: Commit title nits and added linker script alias ]
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210319100146.1149909-3-qperret@google.com
Instead of doing a RMW on SCTLR_EL1 to disable the MMU, use the
existing define that loads the right set of bits.
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmBLsyoUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMpYgf/Zu1Byif+XZVdwm52wJN38ppUUVmn
4u8HvQ8Ht+P0cGg1IaNx9D5QXGRgdn72qEpWUF5aH03ahTANAuf6zXw+evKmiub/
RtJfxZWEcWeLdugLVHUSrR4MOox7uvFmCdcdht4sEPdjFdH/9JeceC3A1pZ/DYTR
+eS+E3dMWQjXnd2Omo/5f5H1LTZjNLEditnkcHT5unwKKukc008V/avgs8xOAKJB
xf3oqJF960IO+NYf8rRQb8WtyGeo0grrWjgeqvZ37gwGUaFB9ldVxchsVLsL66OR
bJRIoSiTgL+TUYSMQ5mKG4tmmBnPHUHfgfNoOXlWMoJHIjFeQ9oM6eTHhA==
=QTFW
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"More fixes for ARM and x86"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: LAPIC: Advancing the timer expiration on guest initiated write
KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode
KVM: kvmclock: Fix vCPUs > 64 can't be online/hotpluged
kvm: x86: annotate RCU pointers
KVM: arm64: Fix exclusive limit for IPA size
KVM: arm64: Reject VM creation when the default IPA size is unsupported
KVM: arm64: Ensure I-cache isolation between vcpus of a same VM
KVM: arm64: Don't use cbz/adr with external symbols
KVM: arm64: Fix range alignment when walking page tables
KVM: arm64: Workaround firmware wrongly advertising GICv2-on-v3 compatibility
KVM: arm64: Rename __vgic_v3_get_ich_vtr_el2() to __vgic_v3_get_gic_config()
KVM: arm64: Don't access PMSELR_EL0/PMUSERENR_EL0 when no PMU is available
KVM: arm64: Turn kvm_arm_support_pmu_v3() into a static key
KVM: arm64: Fix nVHE hyp panic host context restore
KVM: arm64: Avoid corrupting vCPU context register in guest exit
KVM: arm64: nvhe: Save the SPE context early
kvm: x86: use NULL instead of using plain integer as pointer
KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled'
KVM: x86: Ensure deadline timer has truly expired before posting its IRQ
The time pvops functions are the only ones left which might be
used in 32-bit mode and which return a 64-bit value.
Switch them to use the static_call() mechanism instead of pvops, as
this allows quite some simplification of the pvops implementation.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311142319.4723-5-jgross@suse.com
52-bit VA kernels can run on hardware that is only 48-bit capable, but
configure the ID map as 52-bit by default. This was not a problem until
recently, because the special T0SZ value for a 52-bit VA space was never
programmed into the TCR register anwyay, and because a 52-bit ID map
happens to use the same number of translation levels as a 48-bit one.
This behavior was changed by commit 1401bef703 ("arm64: mm: Always update
TCR_EL1 from __cpu_set_tcr_t0sz()"), which causes the unsupported T0SZ
value for a 52-bit VA to be programmed into TCR_EL1. While some hardware
simply ignores this, Mark reports that Amberwing systems choke on this,
resulting in a broken boot. But even before that commit, the unsupported
idmap_t0sz value was exposed to KVM and used to program TCR_EL2 incorrectly
as well.
Given that we already have to deal with address spaces being either 48-bit
or 52-bit in size, the cleanest approach seems to be to simply default to
a 48-bit VA ID map, and only switch to a 52-bit one if the placement of the
kernel in DRAM requires it. This is guaranteed not to happen unless the
system is actually 52-bit VA capable.
Fixes: 90ec95cda9 ("arm64: mm: Introduce VA_BITS_MIN")
Reported-by: Mark Salter <msalter@redhat.com>
Link: http://lore.kernel.org/r/20210310003216.410037-1-msalter@redhat.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20210310171515.416643-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Commit 0fdf1bb759 ("arm64: perf: Avoid PMXEV* indirection") changed
armv8pmu_read_evcntr() to return a u32 instead of u64. The result is
silent truncation of the event counter when using 64-bit counters. Given
the offending commit appears to have passed thru several folks, it seems
likely this was a bad rebase after v8.5 PMU 64-bit counters landed.
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 0fdf1bb759 ("arm64: perf: Avoid PMXEV* indirection")
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210310004412.1450128-1-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
As per ARM ARM DDI 0487G.a, when FEAT_LPA2 is implemented, ID_AA64MMFR0_EL1
might contain a range of values to describe supported translation granules
(4K and 16K pages sizes in particular) instead of just enabled or disabled
values. This changes __enable_mmu() function to handle complete acceptable
range of values (depending on whether the field is signed or unsigned) now
represented with ID_AA64MMFR0_TGRAN_SUPPORTED_[MIN..MAX] pair. While here,
also fix similar situations in EFI stub and KVM as well.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-efi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/1615355590-21102-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
This patch attempts to make it generic enough so other parts of the
kernel can also provide their own implementation of scale_freq_tick()
callback, which is called by the scheduler periodically to update the
per-cpu arch_freq_scale variable.
The implementations now need to provide 'struct scale_freq_data' for the
CPUs for which they have hardware counters available, and a callback
gets registered for each possible CPU in a per-cpu variable.
The arch specific (or ARM AMU) counters are updated to adapt to this and
they take the highest priority if they are available, i.e. they will be
used instead of CPPC based counters for example.
The special code to rebuild the sched domains, in case invariance status
change for the system, is moved out of arm64 specific code and is added
to arch_topology.c.
Note that this also defines SCALE_FREQ_SOURCE_CPUFREQ but doesn't use it
and it is added to show that cpufreq is also acts as source of
information for FIE and will be used by default if no other counters are
supported for a platform.
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Will Deacon <will@kernel.org> # for arm64
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Rename freq_scale to a less generic name, as it will get exported soon
for modules. Since x86 already names its own implementation of this as
arch_freq_scale, lets stick to that.
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The code for setting up the /chosen node in the device tree
and updating the memory reservation for the next kernel has been
moved to of_kexec_alloc_and_setup_fdt() defined in "drivers/of/kexec.c".
Use the common of_kexec_alloc_and_setup_fdt() to setup the device tree
and update the memory reservation for kexec for arm64.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-7-nramas@linux.microsoft.com
ELF related fields elf_headers, elf_headers_sz, and elf_headers_mem
have been moved from 'struct kimage_arch' to 'struct kimage' as
elf_headers, elf_headers_sz, and elf_load_addr respectively.
Use the ELF fields defined in 'struct kimage'.
Suggested-by: Rob Herring <robh@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20210221174930.27324-3-nramas@linux.microsoft.com
The documented behaviour for CMDLINE_EXTEND is that the arguments from
the bootloader are appended to the built-in kernel command line. This
also matches the option parsing behaviour for the EFI stub and early ID
register overrides.
Bizarrely, the fdt behaviour is the other way around: appending the
built-in command line to the bootloader arguments, resulting in a
command-line that doesn't necessarily line-up with the parsing order and
definitely doesn't line-up with the documented behaviour.
As it turns out, there is a proposal [1] to replace CMDLINE_EXTEND with
CMDLINE_PREPEND and CMDLINE_APPEND options which should hopefully make
the intended behaviour much clearer. While we wait for those to land,
drop CMDLINE_EXTEND for now as there appears to be little enthusiasm for
changing the current FDT behaviour.
[1] https://lore.kernel.org/lkml/20190319232448.45964-2-danielwa@cisco.com/
Cc: Max Uvarov <muvarov@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Doug Anderson <dianders@chromium.org>
Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/CAL_JsqJX=TCCs7=gg486r9TN4NYscMTCLNfqJF9crskKPq-bTg@mail.gmail.com
Link: https://lore.kernel.org/r/20210303134927.18975-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The built-in kernel commandline (CONFIG_CMDLINE) can be configured in
three different ways:
1. CMDLINE_FORCE: Use CONFIG_CMDLINE instead of any bootloader args
2. CMDLINE_EXTEND: Append the bootloader args to CONFIG_CMDLINE
3. CMDLINE_FROM_BOOTLOADER: Only use CONFIG_CMDLINE if there aren't
any bootloader args.
The early cmdline parsing to detect idreg overrides gets (2) and (3)
slightly wrong: in the case of (2) the bootloader args are parsed first
and in the case of (3) the CMDLINE is always parsed.
Fix these issues by moving the bootargs parsing out into a helper
function and following the same logic as that used by the EFI stub.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Fixes: 3320030355 ("arm64: cpufeature: Add an early command-line cpufeature override facility")
Link: https://lore.kernel.org/r/20210303134927.18975-2-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
When running under a nesting hypervisor, it isn't guaranteed that
the virtual HW will include a PMU. In which case, let's not try
to access the PMU registers in the world switch, as that'd be
deadly.
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20210209114844.3278746-3-maz@kernel.org
Message-Id: <20210305185254.3730990-6-maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmA4JRkQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpoWqD/9dbbqe8L701U6May1A/4hRsqL4THTA2flx
vNCNRBl6XV3l/wBCtL6waKy6tyO4lyM8XdUdEvo3Kxl2kGPb8eVfpyYL/+77HqyH
ctT4RMrs+84Mxn+5N6cM97hS1qVI2moTxxyvOEl/JTB7BYrutz9gvAoeY3/Dto47
J66oSaPeuqJ32TyihxfQHVxQopJcqFzDjyoYHGDu6ATio1PXfaIdTu8ywVYSECAh
pWI4rwnqdurGuHMNpxyL1bA6CT/jC7s+sqU7bUYUCgtYI3eG0u3V0bp5gAQQIgl9
5sxxE3DidYGAkYZsosrelshBtzGddLdz4Qrt2ungMYv8RsGNpFQ095jDPKDwFaZj
bSvSsfplCo7iFsJByb1TtpNEOW8eAwi81PmBDVQ9Oq5P5ygTYno9GBDc/20ql0Fk
q6wcX28coE3IBw44ne0hIwvBOtXV4WJyluG/gqOxfbTH+kOy3pDsN8lWcY/P4X0U
yzdU2MLHe8BNMyYlUiBF47Amzt4ltr85P4XD3WZ4bX71iwri6HvrdGWLuuKwX+Ie
66QiIDDQIYZQ6NMMJWS9DGW3y3DBizpSXGxONbOw1J2bQdNmtToR0D2UnK/9UnKp
msnvkUNk8fkYGS4aptpJ6HxbmjMEG5YtbiGlPj6fz5/7MTvhRjPxt7A0LWrUIdqR
f88+sHUMqg==
=oc8u
-----END PGP SIGNATURE-----
Merge tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block
Pull io_uring thread rewrite from Jens Axboe:
"This converts the io-wq workers to be forked off the tasks in question
instead of being kernel threads that assume various bits of the
original task identity.
This kills > 400 lines of code from io_uring/io-wq, and it's the worst
part of the code. We've had several bugs in this area, and the worry
is always that we could be missing some pieces for file types doing
unusual things (recent /dev/tty example comes to mind, userfaultfd
reads installing file descriptors is another fun one... - both of
which need special handling, and I bet it's not the last weird oddity
we'll find).
With these identical workers, we can have full confidence that we're
never missing anything. That, in itself, is a huge win. Outside of
that, it's also more efficient since we're not wasting space and code
on tracking state, or switching between different states.
I'm sure we're going to find little things to patch up after this
series, but testing has been pretty thorough, from the usual
regression suite to production. Any issue that may crop up should be
manageable.
There's also a nice series of further reductions we can do on top of
this, but I wanted to get the meat of it out sooner rather than later.
The general worry here isn't that it's fundamentally broken. Most of
the little issues we've found over the last week have been related to
just changes in how thread startup/exit is done, since that's the main
difference between using kthreads and these kinds of threads. In fact,
if all goes according to plan, I want to get this into the 5.10 and
5.11 stable branches as well.
That said, the changes outside of io_uring/io-wq are:
- arch setup, simple one-liner to each arch copy_thread()
implementation.
- Removal of net and proc restrictions for io_uring, they are no
longer needed or useful"
* tag 'io_uring-worker.v3-2021-02-25' of git://git.kernel.dk/linux-block: (30 commits)
io-wq: remove now unused IO_WQ_BIT_ERROR
io_uring: fix SQPOLL thread handling over exec
io-wq: improve manager/worker handling over exec
io_uring: ensure SQPOLL startup is triggered before error shutdown
io-wq: make buffered file write hashed work map per-ctx
io-wq: fix race around io_worker grabbing
io-wq: fix races around manager/worker creation and task exit
io_uring: ensure io-wq context is always destroyed for tasks
arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread()
io_uring: cleanup ->user usage
io-wq: remove nr_process accounting
io_uring: flag new native workers with IORING_FEAT_NATIVE_WORKERS
net: remove cmsg restriction from io_uring based send/recvmsg calls
Revert "proc: don't allow async path resolution of /proc/self components"
Revert "proc: don't allow async path resolution of /proc/thread-self components"
io_uring: move SQPOLL thread io-wq forked worker
io-wq: make io_wq_fork_thread() available to other users
io-wq: only remove worker from free_list, if it was there
io_uring: remove io_identity
io_uring: remove any grabbing of context
...
I have a handful of new RISC-V related patches for this merge window:
* A check to ensure drivers are properly using uaccess. This isn't
manifesting with any of the drivers I'm currently using, but may catch
errors in new drivers.
* Some preliminary support for the FU740, along with the HiFive
Unleashed it will appear on.
* NUMA support for RISC-V, which involves making the arm64 code generic.
* Support for kasan on the vmalloc region.
* A handful of new drivers for the Kendryte K210, along with the DT
plumbing required to boot on a handful of K210-based boards.
* Support for allocating ASIDs.
* Preliminary support for kernels larger than 128MiB.
* Various other improvements to our KASAN support, including the
utilization of huge pages when allocating the KASAN regions.
We may have already found a bug with the KASAN_VMALLOC code, but it's
passing my tests. There's a fix in the works, but that will probably
miss the merge window.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmA4hXATHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYifryD/0SfXGOfj93Cxq7I7AYhhzCN7lJ5jvv
iEQScTlPqU9nfvYodo4EDq0fp+5LIPpTL/XBHtqVjzv0FqRNa28Ea0K7kO8HuXc4
BaUd0m/DqyB4Gfgm4qjc5bDneQ1ZYxVXprYERWNQ5Fj+tdWhaQGOW64N/TVodjjj
NgJtTqbIAcjJqjUtttM8TZN5U1TgwLo+KCqw3iYW12lV1YKBBuvrwvSdD6jnFdIQ
AzG/wRGZhxLoFxgBB/NEsZxDoSd6ztiwxLhS9lX4okZVsryyIdOE70Q/MflfiTlU
xE+AdxQXTMUiiqYSmHeDD6PDb57GT/K3hnjI1yP+lIZpbInsi29JKow1qjyYjfHl
9cSSKYCIXHL7jKU6pgt34G1O5N5+fgqHQhNbfKvlrQ2UPlfs/tWdKHpFIP/z9Jlr
0vCAou7NSEB9zZGqzO63uBLXoN8yfL8FT3uRnnRvoRpfpex5dQX2QqPLQ7327D7N
GUG31nd1PHTJPdxJ1cI4SO24PqPpWDWY9uaea+0jv7ivGClVadZPco/S3ZKloguT
lazYUvyA4oRrSAyln785Rd8vg4CinqTxMtIyZbRMbNkgzVQARi9a8rjvu4n9qms2
2wlXDFi8nR8B4ih5n79dSiiLM9ay9GJDxMcf9VxIxSAYZV2fJALnpK6gV2fzRBUe
+k/uv8BIsFmlwQ==
=CutX
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
"A handful of new RISC-V related patches for this merge window:
- A check to ensure drivers are properly using uaccess. This isn't
manifesting with any of the drivers I'm currently using, but may
catch errors in new drivers.
- Some preliminary support for the FU740, along with the HiFive
Unleashed it will appear on.
- NUMA support for RISC-V, which involves making the arm64 code
generic.
- Support for kasan on the vmalloc region.
- A handful of new drivers for the Kendryte K210, along with the DT
plumbing required to boot on a handful of K210-based boards.
- Support for allocating ASIDs.
- Preliminary support for kernels larger than 128MiB.
- Various other improvements to our KASAN support, including the
utilization of huge pages when allocating the KASAN regions.
We may have already found a bug with the KASAN_VMALLOC code, but it's
passing my tests. There's a fix in the works, but that will probably
miss the merge window.
* tag 'riscv-for-linus-5.12-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (75 commits)
riscv: Improve kasan population by using hugepages when possible
riscv: Improve kasan population function
riscv: Use KASAN_SHADOW_INIT define for kasan memory initialization
riscv: Improve kasan definitions
riscv: Get rid of MAX_EARLY_MAPPING_SIZE
soc: canaan: Sort the Makefile alphabetically
riscv: Disable KSAN_SANITIZE for vDSO
riscv: Remove unnecessary declaration
riscv: Add Canaan Kendryte K210 SD card defconfig
riscv: Update Canaan Kendryte K210 defconfig
riscv: Add Kendryte KD233 board device tree
riscv: Add SiPeed MAIXDUINO board device tree
riscv: Add SiPeed MAIX GO board device tree
riscv: Add SiPeed MAIX DOCK board device tree
riscv: Add SiPeed MAIX BiT board device tree
riscv: Update Canaan Kendryte K210 device tree
dt-bindings: add resets property to dw-apb-timer
dt-bindings: fix sifive gpio properties
dt-bindings: update sifive uart compatible string
dt-bindings: update sifive clint compatible string
...
- Fix lockdep false alarm on resume-from-cpuidle path
- Fix memory leak in kexec_file
- Fix module linker script to work with GDB
- Fix error code when trying to use uprobes with AArch32 instructions
- Fix late VHE enabling with 64k pages
- Add missing ISBs after TLB invalidation
- Fix seccomp when tracing syscall -1
- Fix stacktrace return code at end of stack
- Fix inconsistent whitespace for pointer return values
- Fix compiler warnings when building with W=1
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmA40kUQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNLMUB/93o3Ucd3SeLLmOziyZMWjxCNcuzXAXDhFH
z0q0Zq8U5+xHaCH+jPASNwS7gT6dMX8E60SlXcvVaHuBaH5zsrZnOtpJ5mZQAQ7E
nR1M5ANfusMJ8uRpDHhy5ymJ4IcE/yn74rapBIeGs1e4vWF60Lb6nSVrEJMNRada
zbRr2z9bMecQPGX+KSWpgYg4dLRpyTo8oSYJiYmyoSczGvXhrFHlnIJeaKrJuvGt
IIhil8l9uZd5j0ucVWGiYgAcAuqzgkH2yEiNbkGRwn0nMK+4HGbXpEuzUm/90p3y
lRLQSvx/hKwerIlodUYbFDx4FMXoFfMRQm/8/6tCBrUn/4exDslZ
=wuLk
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The big one is a fix for the VHE enabling path during early boot,
where the code enabling the MMU wasn't necessarily in the identity map
of the new page-tables, resulting in a consistent crash with 64k
pages. In fixing that, we noticed some missing barriers too, so we
added those for the sake of architectural compliance.
Other than that, just the usual merge window trickle. There'll be more
to come, too.
Summary:
- Fix lockdep false alarm on resume-from-cpuidle path
- Fix memory leak in kexec_file
- Fix module linker script to work with GDB
- Fix error code when trying to use uprobes with AArch32 instructions
- Fix late VHE enabling with 64k pages
- Add missing ISBs after TLB invalidation
- Fix seccomp when tracing syscall -1
- Fix stacktrace return code at end of stack
- Fix inconsistent whitespace for pointer return values
- Fix compiler warnings when building with W=1"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: stacktrace: Report when we reach the end of the stack
arm64: ptrace: Fix seccomp of traced syscall -1 (NO_SYSCALL)
arm64: Add missing ISB after invalidating TLB in enter_vhe
arm64: Add missing ISB after invalidating TLB in __primary_switch
arm64: VHE: Enable EL2 MMU from the idmap
KVM: arm64: make the hyp vector table entries local
arm64/mm: Fixed some coding style issues
arm64: uprobe: Return EOPNOTSUPP for AARCH32 instruction probing
kexec: move machine_kexec_post_load() to public interface
arm64 module: set plt* section addresses to 0x0
arm64: kexec_file: fix memory leakage in create_dtb() when fdt_open_into() fails
arm64: spectre: Prevent lockdep splat on v4 mitigation enable path
This change provides a simpler implementation of mte_get_mem_tag(),
mte_get_random_tag(), and mte_set_mem_tag_range().
Simplifications include removing system_supports_mte() checks as these
functions are onlye called from KASAN runtime that had already checked
system_supports_mte(). Besides that, size and address alignment checks
are removed from mte_set_mem_tag_range(), as KASAN now does those.
This change also moves these functions into the asm/mte-kasan.h header and
implements mte_set_mem_tag_range() via inline assembly to avoid
unnecessary functions calls.
[vincenzo.frascino@arm.com: fix warning in mte_get_random_tag()]
Link: https://lkml.kernel.org/r/20210211152208.23811-1-vincenzo.frascino@arm.com
Link: https://lkml.kernel.org/r/a26121b294fdf76e369cb7a74351d1c03a908930.1612546384.git.andreyknvl@google.com
Co-developed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the arm64 unwinder code returns -EINVAL whenever it can't find
the next stack frame, not distinguishing between cases where the stack has
been corrupted or is otherwise in a state it shouldn't be and cases
where we have reached the end of the stack. At the minute none of the
callers care what error code is returned but this will be important for
reliable stack trace which needs to be sure that the stack is intact.
Change to return -ENOENT in the case where we reach the bottom of the
stack. The error codes from this function are only used in kernel, this
particular code is chosen as we are indicating that we know there is no
frame there.
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210224165037.24138-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Since commit f086f67485 ("arm64: ptrace: add support for syscall
emulation"), if system call number -1 is called and the process is being
traced with PTRACE_SYSCALL, for example by strace, the seccomp check is
skipped and -ENOSYS is returned unconditionally (unless altered by the
tracer) rather than carrying out action specified in the seccomp filter.
The consequence of this is that it is not possible to reliably strace
a seccomp based implementation of a foreign system call interface in
which r7/x8 is permitted to be -1 on entry to a system call.
Also trace_sys_enter and audit_syscall_entry are skipped if a system
call is skipped.
Fix by removing the in_syscall(regs) check restoring the previous
behaviour which is like AArch32, x86 (which uses generic code) and
everything else.
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Catalin Marinas<catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>
Fixes: f086f67485 ("arm64: ptrace: add support for syscall emulation")
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Timothy E Baldwin <T.E.Baldwin99@members.leeds.ac.uk>
Link: https://lore.kernel.org/r/90edd33b-6353-1228-791f-0336d94d5f8c@majoroak.me.uk
Signed-off-by: Will Deacon <will@kernel.org>
On a high level, this patch allows running KUnit KASAN tests with the
hardware tag-based KASAN mode.
Internally, this change reenables tag checking at the end of each KASAN
test that triggers a tag fault and leads to tag checking being disabled.
Also simplify is_write calculation in report_tag_fault.
With this patch KASAN tests are still failing for the hardware tag-based
mode; fixes come in the next few patches.
[andreyknvl@google.com: export HW_TAGS symbols for KUnit tests]
Link: https://lkml.kernel.org/r/e7eeb252da408b08f0c81b950a55fb852f92000b.1613155970.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/Id94dc9eccd33b23cda4950be408c27f879e474c8
Link: https://lkml.kernel.org/r/51b23112cf3fd62b8f8e9df81026fa2b15870501.1610733117.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Although there has been a bit of back and forth on the subject, it
appears that invalidating TLBs requires an ISB instruction after the
TLBI/DSB sequence when FEAT_ETS is not implemented by the CPU.
From the bible:
| In an implementation that does not implement FEAT_ETS, a TLB
| maintenance instruction executed by a PE, PEx, can complete at any
| time after it is issued, but is only guaranteed to be finished for a
| PE, PEx, after the execution of DSB by the PEx followed by a Context
| synchronization event
Add the missing ISB in enter_vhe(), just in case.
Fixes: f359182291 ("arm64: Provide an 'upgrade to VHE' stub hypercall")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210224093738.3629662-4-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Although there has been a bit of back and forth on the subject, it
appears that invalidating TLBs requires an ISB instruction when FEAT_ETS
is not implemented by the CPU.
From the bible:
| In an implementation that does not implement FEAT_ETS, a TLB
| maintenance instruction executed by a PE, PEx, can complete at any
| time after it is issued, but is only guaranteed to be finished for a
| PE, PEx, after the execution of DSB by the PEx followed by a Context
| synchronization event
Add the missing ISB in __primary_switch, just in case.
Fixes: 3c5e9f238b ("arm64: head.S: move KASLR processing out of __enable_mmu()")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210224093738.3629662-3-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Enabling the MMU requires the write to SCTLR_ELx (and the ISB
that follows) to live in some identity-mapped memory. Otherwise,
the translation will result in something totally unexpected
(either fetching the wrong instruction stream, or taking a
fault of some sort).
This is exactly what happens in mutate_to_vhe(), as this code
lives in the .hyp.text section, which isn't identity-mapped.
With the right configuration, this explodes badly.
Extract the MMU-enabling part of mutate_to_vhe(), and move
it to its own function that lives in the idmap. This ensures
nothing bad happens.
Fixes: f359182291 ("arm64: Provide an 'upgrade to VHE' stub hypercall")
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Tested-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210224093738.3629662-2-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
- Clang LTO build infrastructure and arm64-specific enablement (Sami Tolvanen)
- Recursive build CC_FLAGS_LTO fix (Alexander Lobakin)
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmA0OEYACgkQiXL039xt
wCYGJw/8CcyvQUGmXYEZVDLMahKz93RYijiGuSTVnhl0pNAyfOojaZ8Z//eD1VNA
s82azW1XybbA6RnPGD7YQzYz27cSF2qUFDmplwVfE4mwBnPXzRxtVBDLSxksP1HS
77sCOu91QhbovPCWET4dSHLJB3DVc78FiW4lVlRgrglyAz+dut1iXYar5e7VNoS0
S4MwnqwteHC6YXP619rubhpdDoj7njuw1uxRIaodt9S/zRSpl5MCUgHmzQusgezs
yWDdPHPWHnF7xxKgwSvE7AKZPdOnIxKxRi6Yd6vUIyrYB3qLZkFe75nUsgMroAhs
/Bgrn69U2McMiJsOdh0ERzP2VNYfvMacBQ308nb45j83Bgv5l6uj8QOZU4ZogmXV
PsDzsfUe9GsxgYexfozGX61rpd6JinzQKVyoDW3oTT54fbBxO3uDqT8kOBw72dPT
9nkOxTzyb+UO0dpb/MhXLGkGcv8+lTA5ffVIKUx5UxKngRbukc3dxwVJgO4HmucK
bwVQGD83D+/if5/JL9WtQRjDwFEn+IFmdv+3cAXkRo4IIS18LPZB1MJncTeWr8Z9
HlkuDXlJOncUWCABGd1IKu1j0S2HpXV4qhqQXJ6PdfOvUPEaD9qgqEAjD5FxxyXF
wpaV2MWya5i1FGwD5UKhi8hVnAFJyF0/w+enjiPwlmIbjdyEVXE=
=6peY
-----END PGP SIGNATURE-----
Merge tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull clang LTO updates from Kees Cook:
"Clang Link Time Optimization.
This is built on the work done preparing for LTO by arm64 folks,
tracing folks, etc. This includes the core changes as well as the
remaining pieces for arm64 (LTO has been the default build method on
Android for about 3 years now, as it is the prerequisite for the
Control Flow Integrity protections).
While x86 LTO enablement is done, it depends on some pending objtool
clean-ups. It's possible that I'll send a "part 2" pull request for
LTO that includes x86 support.
For merge log posterity, and as detailed in commit dc5723b02e
("kbuild: add support for Clang LTO"), here is the lt;dr to do an LTO
build:
make LLVM=1 LLVM_IAS=1 defconfig
scripts/config -e LTO_CLANG_THIN
make LLVM=1 LLVM_IAS=1
(To do a cross-compile of arm64, add "CROSS_COMPILE=aarch64-linux-gnu-"
and "ARCH=arm64" to the "make" command lines.)
Summary:
- Clang LTO build infrastructure and arm64-specific enablement (Sami
Tolvanen)
- Recursive build CC_FLAGS_LTO fix (Alexander Lobakin)"
* tag 'clang-lto-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
kbuild: prevent CC_FLAGS_LTO self-bloating on recursive rebuilds
arm64: allow LTO to be selected
arm64: disable recordmcount with DYNAMIC_FTRACE_WITH_REGS
arm64: vdso: disable LTO
drivers/misc/lkdtm: disable LTO for rodata.o
efi/libstub: disable LTO
scripts/mod: disable LTO for empty.c
modpost: lto: strip .lto from module names
PCI: Fix PREL32 relocations for LTO
init: lto: fix PREL32 relocations
init: lto: ensure initcall ordering
kbuild: lto: add a default list of used symbols
kbuild: lto: merge module sections
kbuild: lto: limit inlining
kbuild: lto: fix module versioning
kbuild: add support for Clang LTO
tracing: move function tracer options to Kconfig
As stated in linux/errno.h, ENOTSUPP should never be seen by user programs.
When we set up uprobe with 32-bit perf and arm64 kernel, we would see the
following vague error without useful hint.
The sys_perf_event_open() syscall returned with 524 (INTERNAL ERROR:
strerror_r(524, [buf], 128)=22)
Use EOPNOTSUPP instead to indicate such cases.
Signed-off-by: He Zhe <zhe.he@windriver.com>
Link: https://lore.kernel.org/r/20210223082535.48730-1-zhe.he@windriver.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the
sense that we don't assign ->set_child_tid with our own structure. Just
ensure that every arch sets up the PF_IO_WORKER threads like kthreads
in the arch implementation of copy_thread().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- Support for userspace to emulate Xen hypercalls
- Raise the maximum number of user memslots
- Scalability improvements for the new MMU. Instead of the complex
"fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an
rwlock so that page faults are concurrent, but the code that can run
against page faults is limited. Right now only page faults take the
lock for reading; in the future this will be extended to some
cases of page table destruction. I hope to switch the default MMU
around 5.12-rc3 (some testing was delayed due to Chinese New Year).
- Cleanups for MAXPHYADDR checks
- Use static calls for vendor-specific callbacks
- On AMD, use VMLOAD/VMSAVE to save and restore host state
- Stop using deprecated jump label APIs
- Workaround for AMD erratum that made nested virtualization unreliable
- Support for LBR emulation in the guest
- Support for communicating bus lock vmexits to userspace
- Add support for SEV attestation command
- Miscellaneous cleanups
PPC:
- Support for second data watchpoint on POWER10
- Remove some complex workarounds for buggy early versions of POWER9
- Guest entry/exit fixes
ARM64
- Make the nVHE EL2 object relocatable
- Cleanups for concurrent translation faults hitting the same page
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Simplification of the early init hypercall handling
Non-KVM changes (with acks):
- Detection of contended rwlocks (implemented only for qrwlocks,
because KVM only needs it for x86)
- Allow __DISABLE_EXPORTS from assembly code
- Provide a saner follow_pfn replacements for modules
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmApSRgUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOc7wf9FnlinKoTFaSk7oeuuhF/CoCVwSFs
Z9+A2sNI99tWHQxFR6dyDkEFeQoXnqSxfLHtUVIdH/JnTg0FkEvFz3NK+0PzY1PF
PnGNbSoyhP58mSBG4gbBAxdF3ZJZMB8GBgYPeR62PvMX2dYbcHqVBNhlf6W4MQK4
5mAUuAnbf19O5N267sND+sIg3wwJYwOZpRZB7PlwvfKAGKf18gdBz5dQ/6Ej+apf
P7GODZITjqM5Iho7SDm/sYJlZprFZT81KqffwJQHWFMEcxFgwzrnYPx7J3gFwRTR
eeh9E61eCBDyCTPpHROLuNTVBqrAioCqXLdKOtO5gKvZI3zmomvAsZ8uXQ==
=uFZU
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"x86:
- Support for userspace to emulate Xen hypercalls
- Raise the maximum number of user memslots
- Scalability improvements for the new MMU.
Instead of the complex "fast page fault" logic that is used in
mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent,
but the code that can run against page faults is limited. Right now
only page faults take the lock for reading; in the future this will
be extended to some cases of page table destruction. I hope to
switch the default MMU around 5.12-rc3 (some testing was delayed
due to Chinese New Year).
- Cleanups for MAXPHYADDR checks
- Use static calls for vendor-specific callbacks
- On AMD, use VMLOAD/VMSAVE to save and restore host state
- Stop using deprecated jump label APIs
- Workaround for AMD erratum that made nested virtualization
unreliable
- Support for LBR emulation in the guest
- Support for communicating bus lock vmexits to userspace
- Add support for SEV attestation command
- Miscellaneous cleanups
PPC:
- Support for second data watchpoint on POWER10
- Remove some complex workarounds for buggy early versions of POWER9
- Guest entry/exit fixes
ARM64:
- Make the nVHE EL2 object relocatable
- Cleanups for concurrent translation faults hitting the same page
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Simplification of the early init hypercall handling
Non-KVM changes (with acks):
- Detection of contended rwlocks (implemented only for qrwlocks,
because KVM only needs it for x86)
- Allow __DISABLE_EXPORTS from assembly code
- Provide a saner follow_pfn replacements for modules"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits)
KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
KVM: selftests: Don't bother mapping GVA for Xen shinfo test
KVM: selftests: Fix hex vs. decimal snafu in Xen test
KVM: selftests: Fix size of memslots created by Xen tests
KVM: selftests: Ignore recently added Xen tests' build output
KVM: selftests: Add missing header file needed by xAPIC IPI tests
KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
locking/arch: Move qrwlock.h include after qspinlock.h
KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
KVM: PPC: remove unneeded semicolon
KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
KVM: PPC: Book3S HV: Fix radix guest SLB side channel
KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
...
- vDSO build improvements including support for building with BSD.
- Cleanup to the AMU support code and initialisation rework to support
cpufreq drivers built as modules.
- Removal of synthetic frame record from exception stack when entering
the kernel from EL0.
- Add support for the TRNG firmware call introduced by Arm spec
DEN0098.
- Cleanup and refactoring across the board.
- Avoid calling arch_get_random_seed_long() from
add_interrupt_randomness()
- Perf and PMU updates including support for Cortex-A78 and the v8.3
SPE extensions.
- Significant steps along the road to leaving the MMU enabled during
kexec relocation.
- Faultaround changes to initialise prefaulted PTEs as 'old' when
hardware access-flag updates are supported, which drastically
improves vmscan performance.
- CPU errata updates for Cortex-A76 (#1463225) and Cortex-A55
(#1024718)
- Preparatory work for yielding the vector unit at a finer granularity
in the crypto code, which in turn will one day allow us to defer
softirq processing when it is in use.
- Support for overriding CPU ID register fields on the command-line.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmAmwZcQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNLA1B/0XMwWUhmJ4ZPK4sr28YWHNGLuCFHDgkMKU
dEmS806OF9d0J7fTczGsKdS4IKtXWko67Z0UGiPIStwfm0itSW2Zgbo9KZeDPqPI
fH0s23nQKxUMyNW7b9p4cTV3YuGVMZSBoMug2jU2DEDpSqeGBk09NPi6inERBCz/
qZxcqXTKxXbtOY56eJmq09UlFZiwfONubzuCrrUH7LU8ZBSInM/6Q4us/oVm4zYI
Pnv996mtL4UxRqq/KoU9+cQ1zsI01kt9/coHwfCYvSpZEVAnTWtfECsJ690tr3mF
TSKQLvOzxbDtU+HcbkNVKW0A38EIO1xXr8yXW9SJx6BJBkyb24xo
=IwMb
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
- vDSO build improvements including support for building with BSD.
- Cleanup to the AMU support code and initialisation rework to support
cpufreq drivers built as modules.
- Removal of synthetic frame record from exception stack when entering
the kernel from EL0.
- Add support for the TRNG firmware call introduced by Arm spec
DEN0098.
- Cleanup and refactoring across the board.
- Avoid calling arch_get_random_seed_long() from
add_interrupt_randomness()
- Perf and PMU updates including support for Cortex-A78 and the v8.3
SPE extensions.
- Significant steps along the road to leaving the MMU enabled during
kexec relocation.
- Faultaround changes to initialise prefaulted PTEs as 'old' when
hardware access-flag updates are supported, which drastically
improves vmscan performance.
- CPU errata updates for Cortex-A76 (#1463225) and Cortex-A55
(#1024718)
- Preparatory work for yielding the vector unit at a finer granularity
in the crypto code, which in turn will one day allow us to defer
softirq processing when it is in use.
- Support for overriding CPU ID register fields on the command-line.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits)
drivers/perf: Replace spin_lock_irqsave to spin_lock
mm: filemap: Fix microblaze build failure with 'mmu_defconfig'
arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+
arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line
arm64: Defer enabling pointer authentication on boot core
arm64: cpufeatures: Allow disabling of BTI from the command-line
arm64: Move "nokaslr" over to the early cpufeature infrastructure
KVM: arm64: Document HVC_VHE_RESTART stub hypercall
arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0
arm64: Add an aliasing facility for the idreg override
arm64: Honor VHE being disabled from the command-line
arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line
arm64: cpufeature: Add an early command-line cpufeature override facility
arm64: Extract early FDT mapping from kaslr_early_init()
arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()
arm64: cpufeature: Add global feature override facility
arm64: Move SCTLR_EL1 initialisation to EL-agnostic code
arm64: Simplify init_el2_state to be non-VHE only
arm64: Move VHE-specific SPE setup to mutate_to_vhe()
arm64: Drop early setting of MDSCR_EL2.TPMS
...
The Spectre-v4 workaround is re-configured when resuming from suspend,
as the firmware may have re-enabled the mitigation despite the user
previously asking for it to be disabled.
Enabling or disabling the workaround can result in an undefined
instruction exception on CPUs which implement PSTATE.SSBS but only allow
it to be configured by adjusting the SPSR on exception return. We handle
this by installing an 'undef hook' which effectively emulates the access.
Installing this hook requires us to take a couple of spinlocks both to
avoid corrupting the internal list of hooks but also to ensure that we
don't run into an unhandled exception. Unfortunately, when resuming from
suspend, we haven't yet called rcu_idle_exit() and so lockdep gets angry
about "suspicious RCU usage". In doing so, it tries to print a warning,
which leads it to get even more suspicious, this time about itself:
| rcu_scheduler_active = 2, debug_locks = 1
| RCU used illegally from extended quiescent state!
| 1 lock held by swapper/0:
| #0: (logbuf_lock){-.-.}-{2:2}, at: vprintk_emit+0x88/0x198
|
| Call trace:
| dump_backtrace+0x0/0x1d8
| show_stack+0x18/0x24
| dump_stack+0xe0/0x17c
| lockdep_rcu_suspicious+0x11c/0x134
| trace_lock_release+0xa0/0x160
| lock_release+0x3c/0x290
| _raw_spin_unlock+0x44/0x80
| vprintk_emit+0xbc/0x198
| vprintk_default+0x44/0x6c
| vprintk_func+0x1f4/0x1fc
| printk+0x54/0x7c
| lockdep_rcu_suspicious+0x30/0x134
| trace_lock_acquire+0xa0/0x188
| lock_acquire+0x50/0x2fc
| _raw_spin_lock+0x68/0x80
| spectre_v4_enable_mitigation+0xa8/0x30c
| __cpu_suspend_exit+0xd4/0x1a8
| cpu_suspend+0xa0/0x104
| psci_cpu_suspend_enter+0x3c/0x5c
| psci_enter_idle_state+0x44/0x74
| cpuidle_enter_state+0x148/0x2f8
| cpuidle_enter+0x38/0x50
| do_idle+0x1f0/0x2b4
Prevent these splats by running __cpu_suspend_exit() with RCU watching.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Saravana Kannan <saravanak@google.com>
Suggested-by: "Paul E . McKenney" <paulmck@kernel.org>
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Fixes: c28762070c ("arm64: Rewrite Spectre-v4 mitigation code")
Cc: <stable@vger.kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210218140346.5224-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The ptrace(PTRACE_PEEKMTETAGS) implementation checks whether the user
page has valid tags (mapped with PROT_MTE) by testing the PG_mte_tagged
page flag. If this bit is cleared, ptrace(PTRACE_PEEKMTETAGS) returns
-EIO.
A newly created (PROT_MTE) mapping points to the zero page which had its
tags zeroed during cpu_enable_mte(). If there were no prior writes to
this mapping, ptrace(PTRACE_PEEKMTETAGS) fails with -EIO since the zero
page does not have the PG_mte_tagged flag set.
Set PG_mte_tagged on the zero page when its tags are cleared during
boot. In addition, to avoid ptrace(PTRACE_PEEKMTETAGS) succeeding on
!PROT_MTE mappings pointing to the zero page, change the
__access_remote_tags() check to (vm_flags & VM_MTE) instead of
PG_mte_tagged.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Fixes: 34bfeea4a9 ("arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE")
Cc: <stable@vger.kernel.org> # 5.10.x
Cc: Will Deacon <will@kernel.org>
Reported-by: Luis Machado <luis.machado@linaro.org>
Tested-by: Luis Machado <luis.machado@linaro.org>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://lore.kernel.org/r/20210210180316.23654-1-catalin.marinas@arm.com
vDSO build improvements.
* for-next/vdso:
arm64: Support running gen_vdso_offsets.sh with BSD userland.
arm64: do not descend to vdso directories twice
Cleanup to the AMU support code and initialisation rework to support
cpufreq drivers built as modules.
* for-next/topology:
arm64: topology: Make AMUs work with modular cpufreq drivers
arm64: topology: Reorder init_amu_fie() a bit
arm64: topology: Avoid the have_policy check
Miscellaneous arm64 changes for 5.12.
* for-next/misc:
arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+
arm64: vmlinux.ld.S: add assertion for tramp_pg_dir offset
arm64: vmlinux.ld.S: add assertion for reserved_pg_dir offset
arm64/ptdump:display the Linear Mapping start marker
arm64: ptrace: Fix missing return in hw breakpoint code
KVM: arm64: Move __hyp_set_vectors out of .hyp.text
arm64: Include linux/io.h in mm/mmap.c
arm64: cacheflush: Remove stale comment
arm64: mm: Remove unused header file
arm64/sparsemem: reduce SECTION_SIZE_BITS
arm64/mm: Add warning for outside range requests in vmemmap_populate()
arm64: Drop workaround for broken 'S' constraint with GCC 4.9
Significant steps along the road to leaving the MMU enabled during kexec
relocation.
* for-next/kexec:
arm64: hibernate: add __force attribute to gfp_t casting
arm64: kexec: arm64_relocate_new_kernel don't use x0 as temp
arm64: kexec: arm64_relocate_new_kernel clean-ups and optimizations
arm64: kexec: call kexec_image_info only once
arm64: kexec: move relocation function setup
arm64: trans_pgd: hibernate: idmap the single page that holds the copy page routines
arm64: mm: Always update TCR_EL1 from __cpu_set_tcr_t0sz()
arm64: trans_pgd: pass NULL instead of init_mm to *_populate functions
arm64: trans_pgd: pass allocator trans_pgd_create_copy
arm64: trans_pgd: make trans_pgd_map_page generic
arm64: hibernate: move page handling function to new trans_pgd.c
arm64: hibernate: variable pudp is used instead of pd4dp
arm64: kexec: make dtb_mem always enabled
Rework of the workaround for Cortex-A76 erratum 1463225 to fit in better
with the ongoing exception entry cleanups and changes to the detection
code for Cortex-A55 erratum 1024718 since it applies to all revisions of
the silicon.
* for-next/errata:
arm64: entry: consolidate Cortex-A76 erratum 1463225 workaround
arm64: Extend workaround for erratum 1024718 to all versions of Cortex-A55
Support for overriding CPU ID register fields on the command-line, which
allows us to disable certain features which the kernel would otherwise
use unconditionally when detected.
* for-next/cpufeature: (22 commits)
arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line
arm64: Defer enabling pointer authentication on boot core
arm64: cpufeatures: Allow disabling of BTI from the command-line
arm64: Move "nokaslr" over to the early cpufeature infrastructure
KVM: arm64: Document HVC_VHE_RESTART stub hypercall
arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0
arm64: Add an aliasing facility for the idreg override
arm64: Honor VHE being disabled from the command-line
arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line
arm64: cpufeature: Add an early command-line cpufeature override facility
arm64: Extract early FDT mapping from kaslr_early_init()
arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()
arm64: cpufeature: Add global feature override facility
arm64: Move SCTLR_EL1 initialisation to EL-agnostic code
arm64: Simplify init_el2_state to be non-VHE only
arm64: Move VHE-specific SPE setup to mutate_to_vhe()
arm64: Drop early setting of MDSCR_EL2.TPMS
arm64: Initialise as nVHE before switching to VHE
arm64: Provide an 'upgrade to VHE' stub hypercall
arm64: Turn the MMU-on sequence into a macro
...
In order to be able to disable Pointer Authentication at runtime,
whether it is for testing purposes, or to work around HW issues,
let's add support for overriding the ID_AA64ISAR1_EL1.{GPI,GPA,API,APA}
fields.
This is further mapped on the arm64.nopauth command-line alias.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Tested-by: Srinivas Ramana <sramana@codeaurora.org>
Link: https://lore.kernel.org/r/20210208095732.3267263-23-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Defer enabling pointer authentication on boot core until
after its required to be enabled by cpufeature framework.
This will help in controlling the feature dynamically
with a boot parameter.
Signed-off-by: Ajay Patil <pajay@qti.qualcomm.com>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1610152163-16554-2-git-send-email-sramana@codeaurora.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-22-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
In order to be able to disable BTI at runtime, whether it is
for testing purposes, or to work around HW issues, let's add
support for overriding the ID_AA64PFR1_EL1.BTI field.
This is further mapped on the arm64.nobti command-line alias.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Tested-by: Srinivas Ramana <sramana@codeaurora.org>
Link: https://lore.kernel.org/r/20210208095732.3267263-21-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Given that the early cpufeature infrastructure has borrowed quite
a lot of code from the kaslr implementation, let's reimplement
the matching of the "nokaslr" option with it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-20-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Admitedly, passing id_aa64mmfr1.vh=0 on the command-line isn't
that easy to understand, and it is likely that users would much
prefer write "kvm-arm.mode=nvhe", or "...=protected".
So here you go. This has the added advantage that we can now
always honor the "kvm-arm.mode=protected" option, even when
booting on a VHE system.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-18-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
In order to map the override of idregs to options that a user
can easily understand, let's introduce yet another option
array, which maps an option to the corresponding idreg options.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-17-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Finally we can check whether VHE is disabled on the command line,
and not enable it if that's the user's wish.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-16-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
As we want to be able to disable VHE at runtime, let's match
"id_aa64mmfr1.vh=" from the command line as an override.
This doesn't have much effect yet as our boot code doesn't look
at the cpufeature, but only at the HW registers.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-15-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
In order to be able to override CPU features at boot time,
let's add a command line parser that matches options of the
form "cpureg.feature=value", and store the corresponding
value into the override val/mask pair.
No features are currently defined, so no expected change in
functionality.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-14-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
As we want to parse more options very early in the kernel lifetime,
let's always map the FDT early. This is achieved by moving that
code out of kaslr_early_init().
No functional change expected.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-13-maz@kernel.org
[will: Ensue KASAN is enabled before running C code]
Signed-off-by: Will Deacon <will@kernel.org>
__read_sysreg_by_encoding() is used by a bunch of cpufeature helpers,
which should take the feature override into account. Let's do that.
For a good measure (and because we are likely to need to further
down the line), make this helper available to the rest of the
non-modular kernel.
Code that needs to know the *real* features of a CPU can still
use read_sysreg_s(), and find the bare, ugly truth.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-12-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add a facility to globally override a feature, no matter what
the HW says. Yes, this sounds dangerous, but we do respect the
"safe" value for a given feature. This doesn't mean the user
doesn't need to know what they are doing.
Nothing uses this yet, so we are pretty safe. For now.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-11-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
We can now move the initial SCTLR_EL1 setup to be used for both
EL1 and EL2 setup.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-10-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
As init_el2_state is now nVHE only, let's simplify it and drop
the VHE setup.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-9-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
There isn't much that a VHE kernel needs on top of whatever has
been done for nVHE, so let's move the little we need to the
VHE stub (the SPE setup), and drop the init_el2_state macro.
No expected functional change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-8-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
As we are aiming to be able to control whether we enable VHE or
not, let's always drop down to EL1 first, and only then upgrade
to VHE if at all possible.
This means that if the kernel is booted at EL2, we always start
with a nVHE init, drop to EL1 to initialise the the kernel, and
only then upgrade the kernel EL to EL2 if possible (the process
is obviously shortened for secondary CPUs).
The resume path is handled similarly to a secondary CPU boot.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-6-maz@kernel.org
[will: Avoid calling switch_to_vhe twice on kaslr path]
Signed-off-by: Will Deacon <will@kernel.org>
The workaround for Cortex-A76 erratum 1463225 is split across the
syscall and debug handlers in separate files. This structure currently
forces us to do some redundant work for debug exceptions from EL0, is a
little difficult to follow, and gets in the way of some future rework of
the exception entry code as it requires exceptions to be unmasked late
in the syscall handling path.
To simplify things, and as a preparatory step for future rework of
exception entry, this patch moves all the workaround logic into
entry-common.c. As the debug handler only needs to run for EL1 debug
exceptions, we no longer call it for EL0 debug exceptions, and no longer
need to check user_mode(regs) as this is always false. For clarity
cortex_a76_erratum_1463225_debug_handler() is changed to return bool.
In the SVC path, the workaround is applied earlier, but this should have
no functional impact as exceptions are still masked. In the debug path
we run the fixup before explicitly disabling preemption, but we will not
attempt to preempt before returning from the exception.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210202120341.28858-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
As we are about to change the way a VHE system boots, let's
provide the core helper, in the form of a stub hypercall that
enables VHE and replicates the full EL1 context at EL2, thanks
to EL1 and VHE-EL2 being extremely similar.
On exception return, the kernel carries on at EL2. Fancy!
Nothing calls this new hypercall yet, so no functional change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: David Brazdil <dbrazdil@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-5-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Turning the MMU on is a popular sport in the arm64 kernel, and
we do it more than once, or even twice. As we are about to add
even more, let's turn it into a macro.
No expected functional change.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: David Brazdil <dbrazdil@google.com>
Link: https://lore.kernel.org/r/20210208095732.3267263-4-maz@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The erratum 1024718 affects Cortex-A55 r0p0 to r2p0. However
we apply the work around for r0p0 - r1p0. Unfortunately this
won't be fixed for the future revisions for the CPU. Thus
extend the work around for all versions of A55, to cover
for r2p0 and any future revisions.
Cc: stable@vger.kernel.org
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20210203230057.3961239-1-suzuki.poulose@arm.com
[will: Update Kconfig help text]
Signed-off-by: Will Deacon <will@kernel.org>
In a few places we don't have whitespace between macro parameters,
which makes them hard to read. This patch adds whitespace to clearly
separate the parameters.
In a few places we have unnecessary whitespace around unary operators,
which is confusing, This patch removes the unnecessary whitespace.
Signed-off-by: Zhiyuan Dai <daizhiyuan@phytium.com.cn>
Link: https://lore.kernel.org/r/1612403029-5011-1-git-send-email-daizhiyuan@phytium.com.cn
Signed-off-by: Will Deacon <will@kernel.org>
Add TRAMP_SWAPPER_OFFSET and use that instead of hardcoding
the offset between swapper_pg_dir and tramp_pg_dir.
Then use TRAMP_SWAPPER_OFFSET to assert that the offset is
correct at link time.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210202123658.22308-3-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Add RESERVED_SWAPPER_OFFSET and use that instead of hardcoding
the offset between swapper_pg_dir and reserved_pg_dir.
Then use RESERVED_SWAPPER_OFFSET to assert that the offset is
correct at link time.
Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210202123658.22308-2-joey.gouly@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
When delivering a hw-breakpoint SIGTRAP to a compat task via ptrace, the
lack of a 'return' statement means we fallthrough to the native case,
which differs in its handling of 'si_errno'.
Although this looks to be harmless because the subsequent signal is
effectively ignored, it's confusing and unintentional, so add the
missing 'return'.
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Link: https://lore.kernel.org/r/20210202002109.GA624440@juliacomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
The only usage of these is to put their addresses in an array of
pointers to const attribute_group structs. Make them const to allow the
compiler to put them in read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
Two new warnings are reported by sparse:
"sparse warnings: (new ones prefixed by >>)"
>> arch/arm64/kernel/hibernate.c:181:39: sparse: sparse: cast to
restricted gfp_t
>> arch/arm64/kernel/hibernate.c:202:44: sparse: sparse: cast from
restricted gfp_t
gfp_t has __bitwise type attribute and requires __force added to casting
in order to avoid these warnings.
Fixes: 50f53fb721 ("arm64: trans_pgd: make trans_pgd_map_page generic")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20210201150306.54099-2-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
The .hyp.text section is supposed to be reserved for the nVHE EL2 code.
However, there is currently one occurrence of EL1 executing code located
in .hyp.text when calling __hyp_{re}set_vectors(), which happen to sit
next to the EL2 stub vectors. While not a problem yet, such patterns
will cause issues when removing the host kernel from the TCB, so a
cleaner split would be preferable.
Fix this by delimiting the end of the .hyp.text section in hyp-stub.S.
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20210128173850.2478161-1-qperret@google.com
Signed-off-by: Will Deacon <will@kernel.org>
x0 will contain the only argument to arm64_relocate_new_kernel; don't
use it as a temp. Reassigned registers to free-up x0 so we won't need
to copy argument, and can use it at the beginning and at the end of the
function.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-13-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
In preparation to bigger changes to arm64_relocate_new_kernel that would
enable this function to do MMU backed memory copy, do few clean-ups and
optimizations. These include:
1. Call raw_dcache_line_size() only when relocation is actually going to
happen. i.e. kdump type kexec, does not need it.
2. copy_page(dest, src, tmps...) increments dest and src by PAGE_SIZE, so
no need to store dest prior to calling copy_page and increment it
after. Also, src is not used after a copy, not need to copy either.
3. For consistency use comment on the same line with instruction when it
describes the instruction itself.
4. Some comment corrections
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-12-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently, kexec_image_info() is called during load time, and
right before kernel is being kexec'ed. There is no need to do both.
So, call it only once when segments are loaded and the physical
location of page with copy of arm64_relocate_new_kernel is known.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-11-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently, kernel relocation function is configured in machine_kexec()
at the time of kexec reboot by using control_code_page.
This operation, however, is more logical to be done during kexec_load,
and thus remove from reboot time. Move, setup of this function to
newly added machine_kexec_post_load().
Because once MMU is enabled, kexec control page will contain more than
relocation kernel, but also vector table, add pointer to the actual
function within this page arch.kern_reloc. Currently, it equals to the
beginning of page, we will add offsets later, when vector table is
added.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-10-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
To resume from hibernate, the contents of memory are restored from
the swap image. This may overwrite any page, including the running
kernel and its page tables.
Hibernate copies the code it uses to do the restore into a single
page that it knows won't be overwritten, and maps it with page tables
built from pages that won't be overwritten.
Today the address it uses for this mapping is arbitrary, but to allow
kexec to reuse this code, it needs to be idmapped. To idmap the page
we must avoid the kernel helpers that have VA_BITS baked in.
Convert create_single_mapping() to take a single PA, and idmap it.
The page tables are built in the reverse order to normal using
pfn_pte() to stir in any bits between 52:48. T0SZ is always increased
to cover 48bits, or 52 if the copy code has bits 52:48 in its PA.
Signed-off-by: James Morse <james.morse@arm.com>
[Adopted the original patch from James to trans_pgd interface, so it can be
commonly used by both Kexec and Hibernate. Some minor clean-ups.]
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/linux-arm-kernel/20200115143322.214247-4-james.morse@arm.com/
Link: https://lore.kernel.org/r/20210125191923.1060122-9-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Make trans_pgd_create_copy and its subroutines to use allocator that is
passed as an argument
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-6-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
kexec is going to use a different allocator, so make
trans_pgd_map_page to accept allocator as an argument, and also
kexec is going to use a different map protection, so also pass
it via argument.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Matthias Brugger <mbrugger@suse.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-5-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Now, that we abstracted the required functions move them to a new home.
Later, we will generalize these function in order to be useful outside
of hibernation.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-4-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
There should be p4dp used when p4d page is allocated.
This is not a functional issue, but for the logical correctness this
should be fixed.
Fixes: e9f6376858 ("arm64: add support for folded p4d page tables")
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-3-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Currently, dtb_mem is enabled only when CONFIG_KEXEC_FILE is
enabled. This adds ugly ifdefs to c files.
Always enabled dtb_mem, when it is not used, it is NULL.
Change the dtb_mem to phys_addr_t, as it is a physical address.
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/r/20210125191923.1060122-2-pasha.tatashin@soleen.com
Signed-off-by: Will Deacon <will@kernel.org>
Storing a function pointer in hyp now generates relocation information
used at early boot to convert the address to hyp VA. The existing
alternative-based conversion mechanism is therefore obsolete. Remove it
and simplify its users.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-8-dbrazdil@google.com
KVM nVHE code runs under a different VA mapping than the kernel, hence
so far it avoided using absolute addressing because the VA in a constant
pool is relocated by the linker to a kernel VA (see hyp_symbol_addr).
Now the kernel has access to a list of positions that contain a kimg VA
but will be accessed only in hyp execution context. These are generated
by the gen-hyprel build-time tool and stored in .hyp.reloc.
Add early boot pass over the entries and convert the kimg VAs to hyp VAs.
Note that this requires for .hyp* ELF sections to be mapped read-write
at that point.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-6-dbrazdil@google.com
Add a post-processing step to compilation of KVM nVHE hyp code which
calls a custom host tool (gen-hyprel) on the partially linked object
file (hyp sections' names prefixed).
The tool lists all R_AARCH64_ABS64 data relocations targeting hyp
sections and generates an assembly file that will form a new section
.hyp.reloc in the kernel binary. The new section contains an array of
32-bit offsets to the positions targeted by these relocations.
Since these addresses of those positions will not be determined until
linking of `vmlinux`, each 32-bit entry carries a R_AARCH64_PREL32
relocation with addend <section_base_sym> + <r_offset>. The linker of
`vmlinux` will therefore fill the slot accordingly.
This relocation data will be used at runtime to convert the kernel VAs
at those positions to hyp VAs.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-5-dbrazdil@google.com
We will need to recognize pointers in .rodata specific to hyp, so
establish a .hyp.rodata ELF section. Merge it with the existing
.hyp.data..ro_after_init as they are treated the same at runtime.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20210105180541.65031-3-dbrazdil@google.com
The AMU counters won't get used today if the cpufreq driver is built as
a module as the amu core requires everything to be ready by late init.
Fix that properly by registering for cpufreq policy notifier. Note that
the amu core don't have any cpufreq dependency after the first time
CPUFREQ_CREATE_POLICY notifier is called for all the CPUs. And so we
don't need to do anything on the CPUFREQ_REMOVE_POLICY notifier. And for
the same reason we check if the CPUs are already parsed in the beginning
of amu_fie_setup() and skip if that is true. Alternatively we can shoot
a work from there to unregister the notifier instead, but that seemed
too much instead of this simple check.
While at it, convert the print message to pr_debug instead of pr_info.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Link: https://lore.kernel.org/r/89c1921334443e133c9c8791b4693607d65ed9f5.1610104461.git.viresh.kumar@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
This patch does a couple of optimizations in init_amu_fie(), like early
exits from paths where we don't need to continue any further, avoid the
enable/disable dance, moving the calls to
topology_scale_freq_invariant() just when we need them, instead of at
the top of the routine, and avoiding calling it for the third time.
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Link: https://lore.kernel.org/r/a732e71ab9ec28c354eb28dd898c9b47d490863f.1610104461.git.viresh.kumar@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
Every time I have stumbled upon this routine, I get confused with the
way 'have_policy' is used and I have to dig in to understand why is it
so. Here is an attempt to make it easier to understand, and hopefully it
is an improvement.
The 'have_policy' check was just an optimization to avoid writing
to amu_fie_cpus in case we don't have to, but that optimization itself
is creating more confusion than the real work. Lets just do that if all
the CPUs support AMUs. It is much cleaner that way.
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Link: https://lore.kernel.org/r/c125766c4be93461772015ac7c9a6ae45d5756f6.1610104461.git.viresh.kumar@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
When entering an exception from EL0, the entry code creates a synthetic
frame record with a NULL PC. This was used by the code introduced in
commit:
7326749801 ("arm64: unwind: reference pt_regs via embedded stack frame")
... to discover exception entries on the stack and dump the associated
pt_regs. Since the NULL PC was undesirable for the stacktrace, we added
a special case to unwind_frame() to prevent the NULL PC from being
logged.
Since commit:
a25ffd3a63 ("arm64: traps: Don't print stack or raw PC/LR values in backtraces")
... we no longer try to dump the pt_regs as part of a stacktrace, and
hence no longer need the synthetic exception record.
This patch removes the synthetic exception record and the associated
special case in unwind_frame(). Instead, EL0 exceptions set the FP to
NULL, as is the case for other terminal records (e.g. when a kernel
thread starts). The synthetic record for exceptions from EL1 is
retrained as this has useful unwind information for the interrupted
context.
To make the terminal case a bit clearer, an explicit check is added to
the start of unwind_frame(). This would otherwise be caught implicitly
by the on_accessible_stack() checks.
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20210113173155.43063-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
BSD sed ignores whitespace character escape sequences such as '\t' in
the replacement string, causing this script to produce the following
incorrect output:
#define vdso_offset_sigtrampt0x089c
Changing the hard tab to ' ' causes both BSD and GNU dialects of sed
to produce equivalent output.
Signed-off-by: John Millikin <john@john-millikin.com>
Link: https://lore.kernel.org/r/15147ffb-7e67-b607-266d-f56599ecafd1@john-millikin.com
Signed-off-by: Will Deacon <will@kernel.org>
arm64 descends into each vdso directory twice; first in vdso_prepare,
second during the ordinary build process.
PPC mimicked it and uncovered a problem [1]. In the first descend,
Kbuild directly visits the vdso directories, therefore it does not
inherit subdir-ccflags-y from upper directories.
This means the command line parameters may differ between the two.
If it happens, the offset values in the generated headers might be
different from real offsets of vdso.so in the kernel.
This potential danger should be avoided. The vdso directories are
built in the vdso_prepare stage, so the second descend is unneeded.
[1]: https://lore.kernel.org/linux-kbuild/CAK7LNARAkJ3_-4gX0VA2UkapbOftuzfSTVMBbgbw=HD8n7N+7w@mail.gmail.com/T/#ma10dcb961fda13f36d42d58fa6cb2da988b7e73a
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20201218024540.1102650-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The kbuild test robot reports that when building with W=1, GCC will warn
for a couple of missing prototypes in syscall.c:
| arch/arm64/kernel/syscall.c:157:6: warning: no previous prototype for 'do_el0_svc' [-Wmissing-prototypes]
| 157 | void do_el0_svc(struct pt_regs *regs)
| | ^~~~~~~~~~
| arch/arm64/kernel/syscall.c:164:6: warning: no previous prototype for 'do_el0_svc_compat' [-Wmissing-prototypes]
| 164 | void do_el0_svc_compat(struct pt_regs *regs)
| | ^~~~~~~~~~~~~~~~~
While this isn't a functional problem, as a general policy we should
include the prototype for functions wherever possible to catch any
accidental divergence between the prototype and implementation. Here we
can easily include <asm/exception.h>, so let's do so.
While there are a number of warnings elsewhere and some warnings enabled
under W=1 are of questionable benefit, this change helps to make the
code more robust as it evolved and reduces the noise somewhat, so it
seems worthwhile.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: kernel test robot <lkp@intel.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/202101141046.n8iPO3mw-lkp@intel.com
Link: https://lore.kernel.org/r/20210114124812.17754-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This is a preparatory patch for unifying numa implementation between
ARM64 & RISC-V. As the numa implementation will be moved to generic
code, rename the arm64 related functions to a generic one.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Disable LTO for the vDSO by filtering out CC_FLAGS_LTO, as there's no
point in using link-time optimization for the small amount of C code.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201211184633.3213045-15-samitolvanen@google.com
S_FRAME_SIZE is the size of the pt_regs structure, no longer the size of
the kernel stack frame, the name is misleading. In keeping with arm32,
rename S_FRAME_SIZE to PT_REGS_SIZE.
Signed-off-by: Jianlin Lv <Jianlin.Lv@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20210112015813.2340969-1-Jianlin.Lv@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This reverts commit 367c820ef0.
lockup_detector_init() makes heavy use of per-cpu variables and must be
called with preemption disabled. Usually, it's handled early during boot
in kernel_init_freeable(), before SMP has been initialised.
Since we do not know whether or not our PMU interrupt can be signalled
as an NMI until considerably later in the boot process, the Arm PMU
driver attempts to re-initialise the lockup detector off the back of a
device_initcall(). Unfortunately, this is called from preemptible
context and results in the following splat:
| BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
| caller is debug_smp_processor_id+0x20/0x2c
| CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.10.0+ #276
| Hardware name: linux,dummy-virt (DT)
| Call trace:
| dump_backtrace+0x0/0x3c0
| show_stack+0x20/0x6c
| dump_stack+0x2f0/0x42c
| check_preemption_disabled+0x1cc/0x1dc
| debug_smp_processor_id+0x20/0x2c
| hardlockup_detector_event_create+0x34/0x18c
| hardlockup_detector_perf_init+0x2c/0x134
| watchdog_nmi_probe+0x18/0x24
| lockup_detector_init+0x44/0xa8
| armv8_pmu_driver_init+0x54/0x78
| do_one_initcall+0x184/0x43c
| kernel_init_freeable+0x368/0x380
| kernel_init+0x1c/0x1cc
| ret_from_fork+0x10/0x30
Rather than bodge this with raw_smp_processor_id() or randomly disabling
preemption, simply revert the culprit for now until we figure out how to
do this properly.
Reported-by: Lecopzer Chen <lecopzer.chen@mediatek.com>
Signed-off-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/20201221162249.3119-1-lecopzer.chen@mediatek.com
Link: https://lore.kernel.org/r/20210112221855.10666-1-will@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
All EL0 returns go via ret_to_user(), which masks IRQs and notifies
lockdep and tracing before calling into do_notify_resume(). Therefore,
there's no need for do_notify_resume() to call trace_hardirqs_off(), and
the comment is stale. The call is simply redundant.
In ret_to_user() we call exit_to_user_mode(), which notifies lockdep and
tracing the IRQs will be enabled in userspace, so there's no need for
el0_svc_common() to call trace_hardirqs_on() before returning. Further,
at the start of ret_to_user() we call trace_hardirqs_off(), so not only
is this redundant, but it is immediately undone.
In addition to being redundant, the trace_hardirqs_on() in
el0_svc_common() leaves lockdep inconsistent with the hardware state,
and is liable to cause issues for any C code or instrumentation
between this and the call to trace_hardirqs_off() which undoes it in
ret_to_user().
This patch removes the redundant tracing calls and associated stale
comments.
Fixes: 23529049c6 ("arm64: entry: fix non-NMI user<->kernel transitions")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210107145310.44616-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* Fixes for the new scalable MMU
* Fixes for migration of nested hypervisors on AMD
* Fix for clang integrated assembler
* Fix for left shift by 64 (UBSAN)
* Small cleanups
* Straggler SEV-ES patch
ARM:
* VM init cleanups
* PSCI relay cleanups
* Kill CONFIG_KVM_ARM_PMU
* Fixup __init annotations
* Fixup reg_to_encoding()
* Fix spurious PMCR_EL0 access
* selftests cleanups
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/4YOMUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMg0Qf/eGDOoLL18KhA9MpoPXusR5kU+82G
AcbFj9siNW46EF7mL+sw/xAx+gZaqSpIEmn/f6BzgiaUBdFTv9CKX3B54e43e59G
HAKD0NpqwvDIi1b0T6bcgC92mY3Qx/IDCc7/9JYjBs/iORfqyWW6xVtkF/Gfymxt
eK+MnfMqqNZODgR/cZnCH1E48fuwOvRMxLqilLi3OOMSUfs2cQOSLTNfYQYqjeaJ
dsQ4YeyPJO5JHtfHFr6VPIo/jDhowniac9CNvOomWWVIx2zPYVSl9d8ub6ESEPNF
GM7UeBCOMmZG/a3qFEZPAUJ7znW4yYE85Z6pjxlrGhd1I54MJi4dd+RApw==
=5Nfj
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fixes for the new scalable MMU
- Fixes for migration of nested hypervisors on AMD
- Fix for clang integrated assembler
- Fix for left shift by 64 (UBSAN)
- Small cleanups
- Straggler SEV-ES patch
ARM:
- VM init cleanups
- PSCI relay cleanups
- Kill CONFIG_KVM_ARM_PMU
- Fixup __init annotations
- Fixup reg_to_encoding()
- Fix spurious PMCR_EL0 access
Misc:
- selftests cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (38 commits)
KVM: x86: __kvm_vcpu_halt can be static
KVM: SVM: Add support for booting APs in an SEV-ES guest
KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit
KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode
KVM: nSVM: correctly restore nested_run_pending on migration
KVM: x86/mmu: Clarify TDP MMU page list invariants
KVM: x86/mmu: Ensure TDP MMU roots are freed after yield
kvm: check tlbs_dirty directly
KVM: x86: change in pv_eoi_get_pending() to make code more readable
MAINTAINERS: Really update email address for Sean Christopherson
KVM: x86: fix shift out of bounds reported by UBSAN
KVM: selftests: Implement perf_test_util more conventionally
KVM: selftests: Use vm_create_with_vcpus in create_vm
KVM: selftests: Factor out guest mode code
KVM/SVM: Remove leftover __svm_vcpu_run prototype from svm.c
KVM: SVM: Add register operand to vmsave call in sev_es_vcpu_load
KVM: x86/mmu: Optimize not-present/MMIO SPTE check in get_mmio_spte()
KVM: x86/mmu: Use raw level to index into MMIO walks' sptes array
KVM: x86/mmu: Get root level from walkers when retrieving MMIO SPTE
KVM: x86/mmu: Use -1 to flag an undefined spte in get_mmio_spte()
...
Currently with ld.lld we emit an empty .eh_frame_hdr section (and a
corresponding program header) into the vDSO. With ld.bfd the section
is not emitted but the program header is, with p_vaddr set to 0. This
can lead to unwinders attempting to interpret the data at whichever
location the program header happens to point to as an unwind info
header. This happens to be mostly harmless as long as the byte at
that location (interpreted as a version number) has a value other
than 1, causing both libgcc and LLVM libunwind to ignore the section
(in libunwind's case, after printing an error message to stderr),
but it could lead to worse problems if the byte happened to be 1 or
the program header points to non-readable memory (e.g. if the empty
section was placed at a page boundary).
Instead of disabling .eh_frame_hdr via --no-eh-frame-hdr (which
also has the downside of being unsupported by older versions of GNU
binutils), disable it by discarding the section, and stop emitting
the program header that points to it.
I understand that we intend to emit valid unwind info for the vDSO
at some point. Once that happens this patch can be reverted.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://linux-review.googlesource.com/id/If745fd9cadcb31b4010acbf5693727fe111b0863
Link: https://lore.kernel.org/r/20201230221954.2007257-1-pcc@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arch/arm64/kernel/smp.c: In function ‘arch_show_interrupts’:
arch/arm64/kernel/smp.c:808:16: warning: unused variable ‘irq’ [-Wunused-variable]
808 | unsigned int irq = irq_desc_get_irq(ipi_desc[i]);
| ^~~
The removal of the last user forgot to remove the variable.
Fixes: 5089bc51f8 ("arm64/smp: Use irq_desc_kstat_cpu() in arch_show_interrupts()")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20201215103026.2872532-1-geert+renesas@glider.be
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
accesses, inefficient and disfunctional code. The goal is to remove the
export of irq_to_desc() to prevent these things from creeping up again.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/ifgsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYm6EACAo8sObkuY3oWLagtGj1KHxon53oGZ
VfDw2LYKM+rgJjDWdiyocyxQU5gtm6loWCrIHjH2adRQ4EisB5r8hfI8NZHxNMyq
8khUi822NRBfFN6SCpO8eW9o95euscNQwCzqi7gV9/U/BAKoDoSEYzS4y0YmJlup
mhoikkrFiBuFXplWI0gbP4ihb8S/to2+kTL6o7eBoJY9+fSXIFR3erZ6f3fLjYZG
CQUUysTywdDhLeDkC9vaesXwgdl2XnaPRwcQqmK8Ez0QYNYpawyILUHLD75cIHDu
bHdK2ZoDv/wtad/3BoGTK3+wChz20a/4/IAnBIUVgmnSLsPtW8zNEOPWNNc0aGg+
rtafi5bvJ1lMoSZhkjLWQDOGU6vFaXl9NkC2fpF+dg1skFMT2CyLC8LD/ekmocon
zHAPBva9j3m2A80hI3dUH9azo/IOl1GHG8ccM6SCxY3S/9vWSQChNhQDLe25xBEO
VtKZS7DYFCRiL8mIy9GgwZWof8Vy2iMua2ML+W9a3mC9u3CqSLbCFmLMT/dDoXl1
oHnMdAHk1DRatA8pJAz83C75RxbAS2riGEqtqLEQ6OaNXn6h0oXCanJX9jdKYDBh
z6ijWayPSRMVktN6FDINsVNFe95N4GwYcGPfagIMqyMMhmJDic6apEzEo7iA76lk
cko28MDqTIK4UQ==
=BXv+
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2020-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"This is the second attempt after the first one failed miserably and
got zapped to unblock the rest of the interrupt related patches.
A treewide cleanup of interrupt descriptor (ab)use with all sorts of
racy accesses, inefficient and disfunctional code. The goal is to
remove the export of irq_to_desc() to prevent these things from
creeping up again"
* tag 'irq-core-2020-12-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
genirq: Restrict export of irq_to_desc()
xen/events: Implement irq distribution
xen/events: Reduce irq_info:: Spurious_cnt storage size
xen/events: Only force affinity mask for percpu interrupts
xen/events: Use immediate affinity setting
xen/events: Remove disfunct affinity spreading
xen/events: Remove unused bind_evtchn_to_irq_lateeoi()
net/mlx5: Use effective interrupt affinity
net/mlx5: Replace irq_to_desc() abuse
net/mlx4: Use effective interrupt affinity
net/mlx4: Replace irq_to_desc() abuse
PCI: mobiveil: Use irq_data_get_irq_chip_data()
PCI: xilinx-nwl: Use irq_data_get_irq_chip_data()
NTB/msi: Use irq_has_action()
mfd: ab8500-debugfs: Remove the racy fiddling with irq_desc
pinctrl: nomadik: Use irq_has_action()
drm/i915/pmu: Replace open coded kstat_irqs() copy
drm/i915/lpe_audio: Remove pointless irq_to_desc() usage
s390/irq: Use irq_desc_kstat_cpu() in show_msi_interrupt()
parisc/irq: Use irq_desc_kstat_cpu() in show_interrupts()
...
There's a config option CONFIG_KASAN_STACK that has to be enabled for
KASAN to use stack instrumentation and perform validity checks for
stack variables.
There's no need to unpoison stack when CONFIG_KASAN_STACK is not enabled.
Only call kasan_unpoison_task_stack[_below]() when CONFIG_KASAN_STACK is
enabled.
Note, that CONFIG_KASAN_STACK is an option that is currently always
defined when CONFIG_KASAN is enabled, and therefore has to be tested
with #if instead of #ifdef.
Link: https://lkml.kernel.org/r/d09dd3f8abb388da397fd11598c5edeaa83fe559.1606162397.git.andreyknvl@google.com
Link: https://linux-review.googlesource.com/id/If8a891e9fe01ea543e00b576852685afec0887e3
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide implementation of KASAN functions required for the hardware
tag-based mode. Those include core functions for memory and pointer
tagging (tags_hw.c) and bug reporting (report_tags_hw.c). Also adapt
common KASAN code to support the new mode.
Link: https://lkml.kernel.org/r/cfd0fbede579a6b66755c98c88c108e54f9c56bf.1606161801.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When MTE is present, the GCR_EL1 register contains the tags mask that
allows to exclude tags from the random generation via the IRG instruction.
With the introduction of the new Tag-Based KASAN API that provides a
mechanism to reserve tags for special reasons, the MTE implementation has
to make sure that the GCR_EL1 setting for the kernel does not affect the
userspace processes and viceversa.
Save and restore the kernel/user mask in GCR_EL1 in kernel entry and exit.
Link: https://lkml.kernel.org/r/578b03294708cc7258fad0dc9c2a2e809e5a8214.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The gcr_user mask is a per thread mask that represents the tags that are
excluded from random generation when the Memory Tagging Extension is
present and an 'irg' instruction is invoked.
gcr_user affects the behavior on EL0 only.
Currently that mask is an include mask and it is controlled by the user
via prctl() while GCR_EL1 accepts an exclude mask.
Convert the include mask into an exclude one to make it easier the
register setting.
Note: This change will affect gcr_kernel (for EL1) introduced with a
future patch.
Link: https://lkml.kernel.org/r/946dd31be833b660334c4f93410acf6d6c4cf3c4.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hardware tag-based KASAN relies on Memory Tagging Extension (MTE) feature
and requires it to be enabled. MTE supports
This patch adds a new mte_enable_kernel() helper, that enables MTE in
Synchronous mode in EL1 and is intended to be called from KASAN runtime
during initialization.
The Tag Checking operation causes a synchronous data abort as a
consequence of a tag check fault when MTE is configured in synchronous
mode.
As part of this change enable match-all tag for EL1 to allow the kernel to
access user pages without faulting. This is required because the kernel
does not have knowledge of the tags set by the user in a page.
Note: For MTE, the TCF bit field in SCTLR_EL1 affects only EL1 in a
similar way as TCF0 affects EL0.
MTE that is built on top of the Top Byte Ignore (TBI) feature hence we
enable it as part of this patch as well.
Link: https://lkml.kernel.org/r/7352b0a0899af65c2785416c8ca6bf3845b66fa1.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The hardware tag-based KASAN for compatibility with the other modes stores
the tag associated to a page in page->flags. Due to this the kernel
faults on access when it allocates a page with an initial tag and the user
changes the tags.
Reset the tag associated by the kernel to a page in all the meaningful
places to prevent kernel faults on access.
Note: An alternative to this approach could be to modify page_to_virt().
This though could end up being racy, in fact if a CPU checks the
PG_mte_tagged bit and decides that the page is not tagged but another CPU
maps the same with PROT_MTE and becomes tagged the subsequent kernel
access would fail.
Link: https://lkml.kernel.org/r/9073d4e973747a6f78d5bdd7ebe17f290d087096.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide helper functions to manipulate allocation and pointer tags for
kernel addresses.
Low-level helper functions (mte_assign_*, written in assembly) operate tag
values from the [0x0, 0xF] range. High-level helper functions
(mte_get/set_*) use the [0xF0, 0xFF] range to preserve compatibility with
normal kernel pointers that have 0xFF in their top byte.
MTE_GRANULE_SIZE and related definitions are moved to mte-def.h header
that doesn't have any dependencies and is safe to include into any
low-level header.
Link: https://lkml.kernel.org/r/c31bf759b4411b2d98cdd801eb928e241584fd1f.1606161801.git.andreyknvl@google.com
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Co-developed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Marco Elver <elver@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Computing the hyp VA layout is redundant when the kernel runs in EL2 and
hyp shares its VA mappings. Make calling kvm_compute_layout()
conditional on not just CONFIG_KVM but also !is_kernel_in_hyp_mode().
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201208142452.87237-4-dbrazdil@google.com
* PSCI relay at EL2 when "protected KVM" is enabled
* New exception injection code
* Simplification of AArch32 system register handling
* Fix PMU accesses when no PMU is enabled
* Expose CSV3 on non-Meltdown hosts
* Cache hierarchy discovery fixes
* PV steal-time cleanups
* Allow function pointers at EL2
* Various host EL2 entry cleanups
* Simplification of the EL2 vector allocation
s390:
* memcg accouting for s390 specific parts of kvm and gmap
* selftest for diag318
* new kvm_stat for when async_pf falls back to sync
x86:
* Tracepoints for the new pagetable code from 5.10
* Catch VFIO and KVM irqfd events before userspace
* Reporting dirty pages to userspace with a ring buffer
* SEV-ES host support
* Nested VMX support for wait-for-SIPI activity state
* New feature flag (AVX512 FP16)
* New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
* Selftest improvements
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl/bdL4UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNgQQgAnTH6rhXa++Zd5F0EM2NwXwz3iEGb
lOq1DZSGjs6Eekjn8AnrWbmVQr+CBCuGU9MrxpSSzNDK/awryo3NwepOWAZw9eqk
BBCVwGBbJQx5YrdgkGC0pDq2sNzcpW/VVB3vFsmOxd9eHblnuKSIxEsCCXTtyqIt
XrLpQ1UhvI4yu102fDNhuFw2EfpzXm+K0Lc0x6idSkdM/p7SyeOxiv8hD4aMr6+G
bGUQuMl4edKZFOWFigzr8NovQAvDHZGrwfihu2cLRYKLhV97QuWVmafv/yYfXcz2
drr+wQCDNzDOXyANnssmviazrhOX0QmTAhbIXGGX/kTxYKcfPi83ZLoI3A==
=ISud
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Much x86 work was pushed out to 5.12, but ARM more than made up for it.
ARM:
- PSCI relay at EL2 when "protected KVM" is enabled
- New exception injection code
- Simplification of AArch32 system register handling
- Fix PMU accesses when no PMU is enabled
- Expose CSV3 on non-Meltdown hosts
- Cache hierarchy discovery fixes
- PV steal-time cleanups
- Allow function pointers at EL2
- Various host EL2 entry cleanups
- Simplification of the EL2 vector allocation
s390:
- memcg accouting for s390 specific parts of kvm and gmap
- selftest for diag318
- new kvm_stat for when async_pf falls back to sync
x86:
- Tracepoints for the new pagetable code from 5.10
- Catch VFIO and KVM irqfd events before userspace
- Reporting dirty pages to userspace with a ring buffer
- SEV-ES host support
- Nested VMX support for wait-for-SIPI activity state
- New feature flag (AVX512 FP16)
- New system ioctl to report Hyper-V-compatible paravirtualization features
Generic:
- Selftest improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (171 commits)
KVM: SVM: fix 32-bit compilation
KVM: SVM: Add AP_JUMP_TABLE support in prep for AP booting
KVM: SVM: Provide support to launch and run an SEV-ES guest
KVM: SVM: Provide an updated VMRUN invocation for SEV-ES guests
KVM: SVM: Provide support for SEV-ES vCPU loading
KVM: SVM: Provide support for SEV-ES vCPU creation/loading
KVM: SVM: Update ASID allocation to support SEV-ES guests
KVM: SVM: Set the encryption mask for the SVM host save area
KVM: SVM: Add NMI support for an SEV-ES guest
KVM: SVM: Guest FPU state save/restore not needed for SEV-ES guest
KVM: SVM: Do not report support for SMM for an SEV-ES guest
KVM: x86: Update __get_sregs() / __set_sregs() to support SEV-ES
KVM: SVM: Add support for CR8 write traps for an SEV-ES guest
KVM: SVM: Add support for CR4 write traps for an SEV-ES guest
KVM: SVM: Add support for CR0 write traps for an SEV-ES guest
KVM: SVM: Add support for EFER write traps for an SEV-ES guest
KVM: SVM: Support string IO operations for an SEV-ES guest
KVM: SVM: Support MMIO for an SEV-ES guest
KVM: SVM: Create trace events for VMGEXIT MSR protocol processing
KVM: SVM: Create trace events for VMGEXIT processing
...
- Work around broken GCC 4.9 handling of "S" asm constraint.
- Suppress W=1 missing prototype warnings.
- Warn the user when a small VA_BITS value cannot map the available
memory.
- Drop the useless update to per-cpu cycles.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl/c8M0ACgkQa9axLQDI
XvF07RAAn/yikfpYuM6UcCZqofz1/zJi/5K0K1gCoZAUnF9JOzt5/6Ds2cmmYcF0
1W0TVMxmHgIrAlo+canNPM4++EDU4QNHEfAauewZUdbvB+YD0CD5qD+yHVmdftBk
YdlfGYB5EZvmoZD40vWtM2N1buKxBhk2d9PLyLcDmk0OlZ1uBcgDMyEBkyrieQeW
O+V/FSr+UltxPNY4gyEJRyB/TyXWF2ndOstd6wZ3pF9DuV8blgs7c0HRfVAGFqm/
rNrj2y/q707ExG+E4j/l+/dvKYSPV9vjmZ1BulbEoM5YsyqN1mQckDfRS/FSwLv1
rxcyxzgtmUR/Wqy/SElmY/3B8zEhHJwbu0TSR1UKadzji460nXAQp2m4rdK1WtvW
KNCYibEoFIXZCVVhDHibACdtAhnFfY/cgbYU1AgZjBsmpnOSFSeUxl+bR0qh/k0t
h+koXaIkmZa+o1FnIbUtR7fkmIlsI6H94xVofQZnoZ/+syGZgmCK+tofvosTjQvN
9E+LEjmifSK36xLZNgel9S1Y1Oa5b1B0tZIth60LTVruJiKb2XaJtcKyEe/Uq+5T
uLAHs58dXEkxPYAqp0umKicERLGw7EdTOKxz+pKXovxpIgJuKZPk8kg39N6zwPNy
sEntAOy1V0zTD+mkrBQc/h86rrsONW+icsK6hpxJQRWKQ7egO+U=
=DKWC
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Catalin Marinas:
"These are some some trivial updates that mostly fix/clean-up code
pushed during the merging window:
- Work around broken GCC 4.9 handling of "S" asm constraint
- Suppress W=1 missing prototype warnings
- Warn the user when a small VA_BITS value cannot map the available
memory
- Drop the useless update to per-cpu cycles"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Work around broken GCC 4.9 handling of "S" constraint
arm64: Warn the user when a small VA_BITS value wastes memory
arm64: entry: suppress W=1 prototype warnings
arm64: topology: Drop the useless update to per-cpu cycles
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/YJxsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpjpyEACBdW+YjenjTbkUPeEXzQgkBkTZUYw3g007
DPcUT1g8PQZXYXlQvBKCvGhhIr7/KVcjepKoowiNQfBNGcIPJTVopW58nzpqAfTQ
goI2WYGn5EKFFKBPvtH04cJD/Wo8muXdxynKtqyZbnGGgZjQxPrE259b8dpHjBSR
6L7HHkk0D1oU/5b6h6Ocpg9mc/0iIUCZylySAYY3eGO0JaVPJaXgZSJZYgHxCHll
Lb+/y/fXdtm/0PmQ3ko0ev54g3yEWqZIX0NsZW1asrButIy+KLzQ2Mz1xFLFDMag
prtIfwb8tzgc4dFPY090C/azjCh5CPpxqYS6FkRwS0p86n6OhkyXrqfily5Hs4/B
NC7CBPBSH/j+NKUK7CYZcpTzTpxPjUr9p0anUdlvMJz8FhTb/3YEEZ1UTeWOeHmk
Yo5SxnFghLeZZeZ1ok6rdymnVa7WEX12SCLGQX31BB2mld0tNbKb4b+FsBF6OUMk
IUaX6OjwDFVRaysC88BQ4hjcIP1HxsViG4/VZDX15gjAAH2Pvb+7tev+lcDcOhjz
TCD4GNFspTFzRhh9nT7oxQ679qCh9G9zHbzuIRewnrS6iqvo5SJQB3dR2yrWZRRH
ySkQFiHpYOlnLJYv0jg9COlGwo2FUdcvKhCvkjQKKBz48rzW/IC0LwKdRQWZDFk3
FKGzP/NBig==
=cadT
-----END PGP SIGNATURE-----
Merge tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block
Pull TIF_NOTIFY_SIGNAL updates from Jens Axboe:
"This sits on top of of the core entry/exit and x86 entry branch from
the tip tree, which contains the generic and x86 parts of this work.
Here we convert the rest of the archs to support TIF_NOTIFY_SIGNAL.
With that done, we can get rid of JOBCTL_TASK_WORK from task_work and
signal.c, and also remove a deadlock work-around in io_uring around
knowing that signal based task_work waking is invoked with the sighand
wait queue head lock.
The motivation for this work is to decouple signal notify based
task_work, of which io_uring is a heavy user of, from sighand. The
sighand lock becomes a huge contention point, particularly for
threaded workloads where it's shared between threads. Even outside of
threaded applications it's slower than it needs to be.
Roman Gershman <romger@amazon.com> reported that his networked
workload dropped from 1.6M QPS at 80% CPU to 1.0M QPS at 100% CPU
after io_uring was changed to use TIF_NOTIFY_SIGNAL. The time was all
spent hammering on the sighand lock, showing 57% of the CPU time there
[1].
There are further cleanups possible on top of this. One example is
TIF_PATCH_PENDING, where a patch already exists to use
TIF_NOTIFY_SIGNAL instead. Hopefully this will also lead to more
consolidation, but the work stands on its own as well"
[1] https://github.com/axboe/liburing/issues/215
* tag 'tif-task_work.arch-2020-12-14' of git://git.kernel.dk/linux-block: (28 commits)
io_uring: remove 'twa_signal_ok' deadlock work-around
kernel: remove checking for TIF_NOTIFY_SIGNAL
signal: kill JOBCTL_TASK_WORK
io_uring: JOBCTL_TASK_WORK is no longer used by task_work
task_work: remove legacy TWA_SIGNAL path
sparc: add support for TIF_NOTIFY_SIGNAL
riscv: add support for TIF_NOTIFY_SIGNAL
nds32: add support for TIF_NOTIFY_SIGNAL
ia64: add support for TIF_NOTIFY_SIGNAL
h8300: add support for TIF_NOTIFY_SIGNAL
c6x: add support for TIF_NOTIFY_SIGNAL
alpha: add support for TIF_NOTIFY_SIGNAL
xtensa: add support for TIF_NOTIFY_SIGNAL
arm: add support for TIF_NOTIFY_SIGNAL
microblaze: add support for TIF_NOTIFY_SIGNAL
hexagon: add support for TIF_NOTIFY_SIGNAL
csky: add support for TIF_NOTIFY_SIGNAL
openrisc: add support for TIF_NOTIFY_SIGNAL
sh: add support for TIF_NOTIFY_SIGNAL
um: add support for TIF_NOTIFY_SIGNAL
...
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...
Don't allow splitting of vm_special_mapping's. It affects vdso/vvar
areas. Uprobes have only one page in xol_area so they aren't affected.
Those restrictions were enforced by checks in .mremap() callbacks.
Restrict resizing with generic .split() callback.
Link: https://lkml.kernel.org/r/20201013013416.390574-7-dima@arista.com
Signed-off-by: Dmitry Safonov <dima@arista.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The previous call to update_freq_counters_refs() has already updated the
per-cpu variables, don't overwrite them with the same value again.
Fixes: 4b9cf23c17 ("arm64: wrap and generalise counter read functions")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/7a171f710cdc0f808a2bfbd7db839c0d265527e7.1607579234.git.viresh.kumar@linaro.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- migrate_disable/enable() support which originates from the RT tree and
is now a prerequisite for the new preemptible kmap_local() API which aims
to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XwK4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoX28D/9cVrvziSQGfBfuQWnUiw8iOIq1QBa2
Me+Tvenhfrlt7xU6rbP9ciFu7eTN+fS06m5uQPGI+t22WuJmHzbmw1bJVXfkvYfI
/QoU+Hg7DkDAn1p7ZKXh0dRkV0nI9ixxSHl0E+Zf1ATBxCUMV2SO85flg6z/4qJq
3VWUye0dmR7/bhtkIjv5rwce9v2JB2g1AbgYXYTW9lHVoUdGoMSdiZAF4tGyHLnx
sJ6DMqQ+k+dmPyYO0z5MTzjW/fXit4n9w2e3z9TvRH/uBu58WSW1RBmQYX6aHBAg
dhT9F4lvTs6lJY23x5RSFWDOv6xAvKF5a0xfb8UZcyH5EoLYrPRvm42a0BbjdeRa
u0z7LbwIlKA+RFdZzFZWz8UvvO0ljyMjmiuqZnZ5dY9Cd80LSBuxrWeQYG0qg6lR
Y2povhhCepEG+q8AXIe2YjHKWKKC1s/l/VY3CNnCzcd21JPQjQ4Z5eWGmHif5IED
CntaeFFhZadR3w02tkX35zFmY3w4soKKrbI4EKWrQwd+cIEQlOSY7dEPI/b5BbYj
MWAb3P4EG9N77AWTNmbhK4nN0brEYb+rBbCA+5dtNBVhHTxAC7OTWElJOC2O66FI
e06dREjvwYtOkRUkUguWwErbIai2gJ2MH0VILV3hHoh64oRk7jjM8PZYnjQkdptQ
Gsq0rJW5iiu/OQ==
=Oz1V
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
- Expose tag address bits in siginfo. The original arm64 ABI did not
expose any of the bits 63:56 of a tagged address in siginfo. In the
presence of user ASAN or MTE, this information may be useful. The
implementation is generic to other architectures supporting tags (like
SPARC ADI, subject to wiring up the arch code). The user will have to
opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra bits, if
available, become visible in si_addr.
- Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the
lowest 1GB to cope with the Raspberry Pi 4 limitations, to the
detriment of other platforms. With these changes, the kernel scans the
Device Tree dma-ranges and the ACPI IORT information before deciding
on a smaller ZONE_DMA.
- Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building
with LTO, there is an increased risk of the compiler converting an
address dependency headed by a READ_ONCE() invocation into a control
dependency and consequently allowing for harmful reordering by the
CPU.
- Add CPPC FFH support using arm64 AMU counters.
- set_fs() removal on arm64. This renders the User Access Override (UAO)
ARMv8 feature unnecessary.
- Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs
identifier file for SMMUv3, stop event counters support for i.MX8MP,
enable the perf events-based hard lockup detector.
- Reorganise the kernel VA space slightly so that 52-bit VA
configurations can use more virtual address space.
- Improve the robustness of the arm64 memory offline event notifier.
- Pad the Image header to 64K following the EFI header definition
updated recently to increase the section alignment to 64K.
- Support CONFIG_CMDLINE_EXTEND on arm64.
- Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8
bits for PtrAuth.
- Switch to vmapped shadow call stacks.
- Miscellaneous clean-ups.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl/XcSgACgkQa9axLQDI
XvGkwg//SLknimELD/cphf2UzZm5RFuCU0x1UnIXs9XYo5BrOpgVLLA//+XkCrKN
0GLAdtBDfw1axWJudzgMBiHrv6wSGh4p3YWjLIW06u/PJu3m3U8oiiolvvF8d7Yq
UKDseKGQnQkrl97J0SyA+Da/u8D11GEzp52SWL5iRxzt6vInEC27iTOp9n1yoaoP
f3y7qdp9kv831ryUM3rXFYpc8YuMWXk+JpBSNaxqmjlvjMzipA5PhzBLmNzfc657
XcrRX5qsgjEeJW8UUnWUVNB42j7tVzN77yraoUpoVVCzZZeWOQxqq5EscKPfIhRt
AjtSIQNOs95ZVE0SFCTjXnUUb823coUs4dMCdftqlE62JNRwdR+3bkfa+QjPTg1F
O9ohW1AzX0/JB19QBxMaOgbheB8GFXh3DVJ6pizTgxJgyPvQQtFuEhT1kq8Cst0U
Pe+pEWsg9t41bUXNz+/l9tUWKWpeCfFNMTrBXLmXrNlTLeOvDh/0UiF0+2lYJYgf
YAboibQ5eOv2wGCcSDEbNMJ6B2/6GtubDJxH4du680F6Emb6pCSw0ntPwB7mSGLG
5dXz+9FJxDLjmxw7BXxQgc5MoYIrt5JQtaOQ6UxU8dPy53/+py4Ck6tXNkz0+Ap7
gPPaGGy1GqobQFu3qlHtOK1VleQi/sWcrpmPHrpiiFUf6N7EmcY=
=zXFk
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Expose tag address bits in siginfo. The original arm64 ABI did not
expose any of the bits 63:56 of a tagged address in siginfo. In the
presence of user ASAN or MTE, this information may be useful. The
implementation is generic to other architectures supporting tags
(like SPARC ADI, subject to wiring up the arch code). The user will
have to opt in via sigaction(SA_EXPOSE_TAGBITS) so that the extra
bits, if available, become visible in si_addr.
- Default to 32-bit wide ZONE_DMA. Previously, ZONE_DMA was set to the
lowest 1GB to cope with the Raspberry Pi 4 limitations, to the
detriment of other platforms. With these changes, the kernel scans
the Device Tree dma-ranges and the ACPI IORT information before
deciding on a smaller ZONE_DMA.
- Strengthen READ_ONCE() to acquire when CONFIG_LTO=y. When building
with LTO, there is an increased risk of the compiler converting an
address dependency headed by a READ_ONCE() invocation into a control
dependency and consequently allowing for harmful reordering by the
CPU.
- Add CPPC FFH support using arm64 AMU counters.
- set_fs() removal on arm64. This renders the User Access Override
(UAO) ARMv8 feature unnecessary.
- Perf updates: PMU driver for the ARM DMC-620 memory controller, sysfs
identifier file for SMMUv3, stop event counters support for i.MX8MP,
enable the perf events-based hard lockup detector.
- Reorganise the kernel VA space slightly so that 52-bit VA
configurations can use more virtual address space.
- Improve the robustness of the arm64 memory offline event notifier.
- Pad the Image header to 64K following the EFI header definition
updated recently to increase the section alignment to 64K.
- Support CONFIG_CMDLINE_EXTEND on arm64.
- Do not use tagged PC in the kernel (TCR_EL1.TBID1==1), freeing up 8
bits for PtrAuth.
- Switch to vmapped shadow call stacks.
- Miscellaneous clean-ups.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits)
perf/imx_ddr: Add system PMU identifier for userspace
bindings: perf: imx-ddr: add compatible string
arm64: Fix build failure when HARDLOCKUP_DETECTOR_PERF is enabled
arm64: mte: fix prctl(PR_GET_TAGGED_ADDR_CTRL) if TCF0=NONE
arm64: mark __system_matches_cap as __maybe_unused
arm64: uaccess: remove vestigal UAO support
arm64: uaccess: remove redundant PAN toggling
arm64: uaccess: remove addr_limit_user_check()
arm64: uaccess: remove set_fs()
arm64: uaccess cleanup macro naming
arm64: uaccess: split user/kernel routines
arm64: uaccess: refactor __{get,put}_user
arm64: uaccess: simplify __copy_user_flushcache()
arm64: uaccess: rename privileged uaccess routines
arm64: sdei: explicitly simulate PAN/UAO entry
arm64: sdei: move uaccess logic to arch/arm64/
arm64: head.S: always initialize PSTATE
arm64: head.S: cleanup SCTLR_ELx initialization
arm64: head.S: rename el2_setup -> init_kernel_el
arm64: add C wrappers for SET_PSTATE_*()
...
* for-next/kvm-build-fix:
: Fix KVM build issues with 64K pages
KVM: arm64: Fix build error in user_mem_abort()
* for-next/va-refactor:
: VA layout changes
arm64: mm: don't assume struct page is always 64 bytes
Documentation/arm64: fix RST layout of memory.rst
arm64: mm: tidy up top of kernel VA space
arm64: mm: make vmemmap region a projection of the linear region
arm64: mm: extend linear region for 52-bit VA configurations
* for-next/lto:
: Upgrade READ_ONCE() to RCpc acquire on arm64 with LTO
arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y
arm64: alternatives: Remove READ_ONCE() usage during patch operation
arm64: cpufeatures: Add capability for LDAPR instruction
arm64: alternatives: Split up alternative.h
arm64: uaccess: move uao_* alternatives to asm-uaccess.h
* for-next/mem-hotplug:
: Memory hotplug improvements
arm64/mm/hotplug: Ensure early memory sections are all online
arm64/mm/hotplug: Enable MEM_OFFLINE event handling
arm64/mm/hotplug: Register boot memory hot remove notifier earlier
arm64: mm: account for hotplug memory when randomizing the linear region
* for-next/cppc-ffh:
: Add CPPC FFH support using arm64 AMU counters
arm64: abort counter_read_on_cpu() when irqs_disabled()
arm64: implement CPPC FFH support using AMUs
arm64: split counter validation function
arm64: wrap and generalise counter read functions
* for-next/pad-image-header:
: Pad Image header to 64KB and unmap it
arm64: head: tidy up the Image header definition
arm64/head: avoid symbol names pointing into first 64 KB of kernel image
arm64: omit [_text, _stext) from permanent kernel mapping
* for-next/zone-dma-default-32-bit:
: Default to 32-bit wide ZONE_DMA (previously reduced to 1GB for RPi4)
of: unittest: Fix build on architectures without CONFIG_OF_ADDRESS
mm: Remove examples from enum zone_type comment
arm64: mm: Set ZONE_DMA size based on early IORT scan
arm64: mm: Set ZONE_DMA size based on devicetree's dma-ranges
of: unittest: Add test for of_dma_get_max_cpu_address()
of/address: Introduce of_dma_get_max_cpu_address()
arm64: mm: Move zone_dma_bits initialization into zone_sizes_init()
arm64: mm: Move reserve_crashkernel() into mem_init()
arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required
arm64: Ignore any DMA offsets in the max_zone_phys() calculation
* for-next/signal-tag-bits:
: Expose the FAR_EL1 tag bits in siginfo
arm64: expose FAR_EL1 tag bits in siginfo
signal: define the SA_EXPOSE_TAGBITS bit in sa_flags
signal: define the SA_UNSUPPORTED bit in sa_flags
arch: provide better documentation for the arch-specific SA_* flags
signal: clear non-uapi flag bits when passing/returning sa_flags
arch: move SA_* definitions to generic headers
parisc: start using signal-defs.h
parisc: Drop parisc special case for __sighandler_t
* for-next/cmdline-extended:
: Add support for CONFIG_CMDLINE_EXTENDED
arm64: Extend the kernel command line from the bootloader
arm64: kaslr: Refactor early init command line parsing
Conflict resolution gone astray results in the kernel not booting
on VHE-capable HW when VHE support is disabled. Thankfully spotted
by David.
Reported-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
When compiling with __KVM_NVHE_HYPERVISOR__, redefine per_cpu_offset()
to __hyp_per_cpu_offset() which looks up the base of the nVHE per-CPU
region of the given cpu and computes its offset from the
.hyp.data..percpu section.
This enables use of per_cpu_ptr() helpers in nVHE hyp code. Until now
only this_cpu_ptr() was supported by setting TPIDR_EL2.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-14-dbrazdil@google.com
Add rules for renaming the .data..ro_after_init ELF section in KVM nVHE
object files to .hyp.data..ro_after_init, linking it into the kernel
and mapping it in hyp at runtime.
The section is RW to the host, then mapped RO in hyp. The expectation is
that the host populates the variables in the section and they are never
changed by hyp afterwards.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-13-dbrazdil@google.com
MAIR_EL2 and TCR_EL2 are currently initialized from their _EL1 values.
This will not work once KVM starts intercepting PSCI ON/SUSPEND SMCs
and initializing EL2 state before EL1 state.
Obtain the EL1 values during KVM init and store them in the init params
struct. The struct will stay in memory and can be used when booting new
cores.
Take the opportunity to move copying the T0SZ value from idmap_t0sz in
KVM init rather than in .hyp.idmap.text. This avoids the need for the
idmap_t0sz symbol alias.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-12-dbrazdil@google.com
Once we start initializing KVM on newly booted cores before the rest of
the kernel, parameters to __do_hyp_init will need to be provided by EL2
rather than EL1. At that point it will not be possible to pass its three
arguments directly because PSCI_CPU_ON only supports one context
argument.
Refactor __do_hyp_init to accept its parameters in a struct. This
prepares the code for KVM booting cores as well as removes any limits on
the number of __do_hyp_init arguments.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-11-dbrazdil@google.com
When a CPU is booted in EL2, the kernel checks for VHE support and
initializes the CPU core accordingly. For nVHE it also installs the stub
vectors and drops down to EL1.
Once KVM gains the ability to boot cores without going through the
kernel entry point, it will need to initialize the CPU the same way.
Extract the relevant bits of el2_setup into an init_el2_state macro
with an argument specifying whether to initialize for VHE or nVHE.
The following ifdefs are removed:
* CONFIG_ARM_GIC_V3 - always selected on arm64
* CONFIG_COMPAT - hstr_el2 can be set even without 32-bit support
No functional change intended. Size of el2_setup increased by
148 bytes due to duplication.
Signed-off-by: David Brazdil <dbrazdil@google.com>
[maz: reworked to fit the new PSTATE initial setup code]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-9-dbrazdil@google.com
CPU index should never be negative. Change the signature of
(set_)cpu_logical_map to take an unsigned int.
This still works even if the users treat the CPU index as an int,
and will allow the hypervisor's implementation to check that the index
is valid with a single upper-bound check.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-8-dbrazdil@google.com
Expose the boolean value whether the system is running with KVM in
protected mode (nVHE + kernel param). CPU capability was selected over
a global variable to allow use in alternatives.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201202184122.26046-3-dbrazdil@google.com
Now that the PAN toggling has been removed, the only user of
__system_matches_cap() is has_generic_auth(), which is only built when
CONFIG_ARM64_PTR_AUTH is selected, and Qian reports that this results in
a build-time warning when CONFIG_ARM64_PTR_AUTH is not selected:
| arch/arm64/kernel/cpufeature.c:2649:13: warning: '__system_matches_cap' defined but not used [-Wunused-function]
| static bool __system_matches_cap(unsigned int n)
| ^~~~~~~~~~~~~~~~~~~~
It's tricky to restructure things to prevent this, so let's mark
__system_matches_cap() as __maybe_unused, as we used to do for the other
user of __system_matches_cap() which we just removed.
Reported-by: Qian Cai <qcai@redhat.com>
Suggested-by: Qian Cai <qcai@redhat.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20201203152403.26100-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Fix numerous issues with instrumentation and exception entry
- Fix hideous typo in unused register field definition
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl/HvbwQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNFTMB/oD0ucfP6CH65w+7Sbsv2L8FABfYzSrA9gP
f1cmeh1+MyRN4Nbx2ves5wcRGoX1CgZ8KFAmLXG6yyn7UDA/q27CTELknwobhOft
tQIPB2hFDW9qq3VBXFReL3aoXLnWUiRL3nBxQFt7LG1Xor/ivEb1ZFht351UklDh
u1P6NVptpjXFuGPvdqxkHo2WzT0QHI57MRuc1l7I1FRo4dV1nKSlwohu0Ydii4q9
8oLhx77Ga1SWK80IztNmpo7CSMP/FLGDwbUE3vAaftUJx5CBt+lYR1CeWNACSEvy
22y7CkJWKGQccG62oHI7zQaZm1+fum70ndP5dDlfQW/BcCaz8vRH
=KSDc
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I'm sad to say that we've got an unusually large arm64 fixes pull for
rc7 which addresses numerous significant instrumentation issues with
our entry code.
Without these patches, lockdep is hopelessly unreliable in some
configurations [1,2] and syzkaller is therefore not a lot of use
because it's so noisy.
Although much of this has always been broken, it appears to have been
exposed more readily by other changes such as 044d0d6de9 ("lockdep:
Only trace IRQ edges") and general lockdep improvements around IRQ
tracing and NMIs.
Fixing this properly required moving much of the instrumentation hooks
from our entry assembly into C, which Mark has been working on for the
last few weeks. We're not quite ready to move to the recently added
generic functions yet, but the code here has been deliberately written
to mimic that closely so we can look at cleaning things up once we
have a bit more breathing room.
Having said all that, the second version of these patches was posted
last week and I pushed it into our CI (kernelci and cki) along with a
commit which forced on PROVE_LOCKING, NOHZ_FULL and
CONTEXT_TRACKING_FORCE. The result? We found a real bug in the
md/raid10 code [3].
Oh, and there's also a really silly typo patch that's unrelated.
Summary:
- Fix numerous issues with instrumentation and exception entry
- Fix hideous typo in unused register field definition"
[1] https://lore.kernel.org/r/CACT4Y+aAzoJ48Mh1wNYD17pJqyEcDnrxGfApir=-j171TnQXhw@mail.gmail.com
[2] https://lore.kernel.org/r/20201119193819.GA2601289@elver.google.com
[3] https://lore.kernel.org/r/94c76d5e-466a-bc5f-e6c2-a11b65c39f83@redhat.com
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: mte: Fix typo in macro definition
arm64: entry: fix EL1 debug transitions
arm64: entry: fix NMI {user, kernel}->kernel transitions
arm64: entry: fix non-NMI kernel<->kernel transitions
arm64: ptrace: prepare for EL1 irq/rcu tracking
arm64: entry: fix non-NMI user<->kernel transitions
arm64: entry: move el1 irq/nmi logic to C
arm64: entry: prepare ret_to_user for function call
arm64: entry: move enter_from_user_mode to entry-common.c
arm64: entry: mark entry code as noinstr
arm64: mark idle code as noinstr
arm64: syscall: exit userspace before unmasking exceptions
Now that arm64 no longer uses UAO, remove the vestigal feature detection
code and Kconfig text.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-13-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Some code (e.g. futex) needs to make privileged accesses to userspace
memory, and uses uaccess_{enable,disable}_privileged() in order to
permit this. All other uaccess primitives use LDTR/STTR, and never need
to toggle PAN.
Remove the redundant PAN toggling.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-12-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that set_fs() is gone, addr_limit_user_check() is redundant. Remove
the checks and associated thread flag.
To ensure that _TIF_WORK_MASK can be used as an immediate value in an
AND instruction (as it is in `ret_to_user`), TIF_MTE_ASYNC_FAULT is
renumbered to keep the constituent bits of _TIF_WORK_MASK contiguous.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-11-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Now that the uaccess primitives dont take addr_limit into account, we
have no need to manipulate this via set_fs() and get_fs(). Remove
support for these, along with some infrastructure this renders
redundant.
We no longer need to flip UAO to access kernel memory under KERNEL_DS,
and head.S unconditionally clears UAO for all kernel configurations via
an ERET in init_kernel_el. Thus, we don't need to dynamically flip UAO,
nor do we need to context-switch it. However, we still need to adjust
PAN during SDEI entry.
Masking of __user pointers no longer needs to use the dynamic value of
addr_limit, and can use a constant derived from the maximum possible
userspace task size. A new TASK_SIZE_MAX constant is introduced for
this, which is also used by core code. In configurations supporting
52-bit VAs, this may include a region of unusable VA space above a
48-bit TTBR0 limit, but never includes any portion of TTBR1.
Note that TASK_SIZE_MAX is an exclusive limit, while USER_DS and
KERNEL_DS were inclusive limits, and is converted to a mask by
subtracting one.
As the SDEI entry code repurposes the otherwise unnecessary
pt_regs::orig_addr_limit field to store the TTBR1 of the interrupted
context, for now we rename that to pt_regs::sdei_ttbr1. In future we can
consider factoring that out.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We currently have many uaccess_*{enable,disable}*() variants, which
subsequent patches will cut down as part of removing set_fs() and
friends. Once this simplification is made, most uaccess routines will
only need to ensure that the user page tables are mapped in TTBR0, as is
currently dealt with by uaccess_ttbr0_{enable,disable}().
The existing uaccess_{enable,disable}() routines ensure that user page
tables are mapped in TTBR0, and also disable PAN protections, which is
necessary to be able to use atomics on user memory, but also permit
unrelated privileged accesses to access user memory.
As preparatory step, let's rename uaccess_{enable,disable}() to
uaccess_{enable,disable}_privileged(), highlighting this caveat and
discouraging wider misuse. Subsequent patches can reuse the
uaccess_{enable,disable}() naming for the common case of ensuring the
user page tables are mapped in TTBR0.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for removing addr_limit and set_fs() we must decouple the
SDEI PAN/UAO manipulation from the uaccess code, and explicitly
reinitialize these as required.
SDEI enters the kernel with a non-architectural exception, and prior to
the most recent revision of the specification (ARM DEN 0054B), PSTATE
bits (e.g. PAN, UAO) are not manipulated in the same way as for
architectural exceptions. Notably, older versions of the spec can be
read ambiguously as to whether PSTATE bits are inherited unchanged from
the interrupted context or whether they are generated from scratch, with
TF-A doing the latter.
We have three cases to consider:
1) The existing TF-A implementation of SDEI will clear PAN and clear UAO
(along with other bits in PSTATE) when delivering an SDEI exception.
2) In theory, implementations of SDEI prior to revision B could inherit
PAN and UAO (along with other bits in PSTATE) unchanged from the
interrupted context. However, in practice such implementations do not
exist.
3) Going forward, new implementations of SDEI must clear UAO, and
depending on SCTLR_ELx.SPAN must either inherit or set PAN.
As we can ignore (2) we can assume that upon SDEI entry, UAO is always
clear, though PAN may be clear, inherited, or set per SCTLR_ELx.SPAN.
Therefore, we must explicitly initialize PAN, but do not need to do
anything for UAO.
Considering what we need to do:
* When set_fs() is removed, force_uaccess_begin() will have no HW
side-effects. As this only clears UAO, which we can assume has already
been cleared upon entry, this is not a problem. We do not need to add
code to manipulate UAO explicitly.
* PAN may be cleared upon entry (in case 1 above), so where a kernel is
built to use PAN and this is supported by all CPUs, the kernel must
set PAN upon entry to ensure expected behaviour.
* PAN may be inherited from the interrupted context (in case 3 above),
and so where a kernel is not built to use PAN or where PAN support is
not uniform across CPUs, the kernel must clear PAN to ensure expected
behaviour.
This patch reworks the SDEI code accordingly, explicitly setting PAN to
the expected state in all cases. To cater for the cases where the kernel
does not use PAN or this is not uniformly supported by hardware we add a
new cpu_has_pan() helper which can be used regardless of whether the
kernel is built to use PAN.
The existing system_uses_ttbr0_pan() is redefined in terms of
system_uses_hw_pan() both for clarity and as a minor optimization when
HW PAN is not selected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The SDEI support code is split across arch/arm64/ and drivers/firmware/,
largley this is split so that the arch-specific portions are under
arch/arm64, and the management logic is under drivers/firmware/.
However, exception entry fixups are currently under drivers/firmware.
Let's move the exception entry fixups under arch/arm64/. This
de-clutters the management logic, and puts all the arch-specific
portions in one place. Doing this also allows the fixups to be applied
earlier, so things like PAN and UAO will be in a known good state before
we run other logic. This will also make subsequent refactoring easier.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: James Morse <james.morse@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201202131558.39270-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As with SCTLR_ELx and other control registers, some PSTATE bits are
UNKNOWN out-of-reset, and we may not be able to rely on hardware or
firmware to initialize them to our liking prior to entry to the kernel,
e.g. in the primary/secondary boot paths and return from idle/suspend.
It would be more robust (and easier to reason about) if we consistently
initialized PSTATE to a default value, as we do with control registers.
This will ensure that the kernel is not adversely affected by bits it is
not aware of, e.g. when support for a feature such as PAN/UAO is
disabled.
This patch ensures that PSTATE is consistently initialized at boot time
via an ERET. This is not intended to relax the existing requirements
(e.g. DAIF bits must still be set prior to entering the kernel). For
features detected dynamically (which may require system-wide support),
it is still necessary to subsequently modify PSTATE.
As ERET is not always a Context Synchronization Event, an ISB is placed
before each exception return to ensure updates to control registers have
taken effect. This handles the kernel being entered with SCTLR_ELx.EOS
clear (or any future control bits being in an UNKNOWN state).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Let's make SCTLR_ELx initialization a bit clearer by using meaningful
names for the initialization values, following the same scheme for
SCTLR_EL1 and SCTLR_EL2.
These definitions will be used more widely in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For a while now el2_setup has performed some basic initialization of EL1
even when the kernel is booted at EL1, so the name is a little
misleading. Further, some comments are stale as with VHE it doesn't drop
the CPU to EL1.
To clarify things, rename el2_setup to init_kernel_el, and update
comments to be clearer as to the function's purpose.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
To make callsites easier to read, add trivial C wrappers for the
SET_PSTATE_*() helpers, and convert trivial uses over to these. The new
wrappers will be used further in subsequent patches.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
For consistency, all tasks have a pt_regs reserved at the highest
portion of their task stack. Among other things, this ensures that a
task's SP is always pointing within its stack rather than pointing
immediately past the end.
While it is never legitimate to ERET from a kthread, we take pains to
initialize pt_regs for kthreads as if this were legitimate. As this is
never legitimate, the effects of an erroneous return are rarely tested.
Let's simplify things by initializing a kthread's pt_regs such that an
ERET is caught as an illegal exception return, and removing the explicit
initialization of other exception context. Note that as
spectre_v4_enable_task_mitigation() only manipulates the PSTATE within
the unused regs this is safe to remove.
As user tasks will have their exception context initialized via
start_thread() or start_compat_thread(), this should only impact cases
where something has gone very wrong and we'd like that to be clearly
indicated.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201113124937.20574-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Handling all combinations of the VMAP_STACK and SHADOW_CALL_STACK options
in sdei_arch_get_entry_point() makes the code difficult to read,
particularly when considering the error and cleanup paths.
Move the checking of these options into the callee functions, so that
they return early if the relevant option is not enabled.
Signed-off-by: Will Deacon <will@kernel.org>
Use scs_alloc() to allocate also IRQ and SDEI shadow stacks instead of
using statically allocated stacks.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130233442.2562064-3-samitolvanen@google.com
[will: Move CONFIG_SHADOW_CALL_STACK check into init_irq_scs()]
Signed-off-by: Will Deacon <will@kernel.org>
In debug_exception_enter() and debug_exception_exit() we trace hardirqs
on/off while RCU isn't guaranteed to be watching, and we don't save and
restore the hardirq state, and so may return with this having changed.
Handle this appropriately with new entry/exit helpers which do the bare
minimum to ensure this is appropriately maintained, without marking
debug exceptions as NMIs. These are placed in entry-common.c with the
other entry/exit helpers.
In future we'll want to reconsider whether some debug exceptions should
be NMIs, but this will require a significant refactoring, and for now
this should prevent issues with lockdep and RCU.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marins <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-12-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Exceptions which can be taken at (almost) any time are consdiered to be
NMIs. On arm64 that includes:
* SDEI events
* GICv3 Pseudo-NMIs
* Kernel stack overflows
* Unexpected/unhandled exceptions
... but currently debug exceptions (BRKs, breakpoints, watchpoints,
single-step) are not considered NMIs.
As these can be taken at any time, kernel features (lockdep, RCU,
ftrace) may not be in a consistent kernel state. For example, we may
take an NMI from the idle code or partway through an entry/exit path.
While nmi_enter() and nmi_exit() handle most of this state, notably they
don't save/restore the lockdep state across an NMI being taken and
handled. When interrupts are enabled and an NMI is taken, lockdep may
see interrupts become disabled within the NMI code, but not see
interrupts become enabled when returning from the NMI, leaving lockdep
believing interrupts are disabled when they are actually disabled.
The x86 code handles this in idtentry_{enter,exit}_nmi(), which will
shortly be moved to the generic entry code. As we can't use either yet,
we copy the x86 approach in arm64-specific helpers. All the NMI
entrypoints are marked as noinstr to prevent any instrumentation
handling code being invoked before the state has been corrected.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-11-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
There are periods in kernel mode when RCU is not watching and/or the
scheduler tick is disabled, but we can still take exceptions such as
interrupts. The arm64 exception handlers do not account for this, and
it's possible that RCU is not watching while an exception handler runs.
The x86/generic entry code handles this by ensuring that all (non-NMI)
kernel exception handlers call irqentry_enter() and irqentry_exit(),
which handle RCU, lockdep, and IRQ flag tracing. We can't yet move to
the generic entry code, and already hadnle the user<->kernel transitions
elsewhere, so we add new kernel<->kernel transition helpers alog the
lines of the generic entry code.
Since we now track interrupts becoming masked when an exception is
taken, local_daif_inherit() is modified to track interrupts becoming
re-enabled when the original context is inherited. To balance the
entry/exit paths, each handler masks all DAIF exceptions before
exit_to_kernel_mode().
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-10-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
When built with PROVE_LOCKING, NO_HZ_FULL, and CONTEXT_TRACKING_FORCE
will WARN() at boot time that interrupts are enabled when we call
context_tracking_user_enter(), despite the DAIF flags indicating that
IRQs are masked.
The problem is that we're not tracking IRQ flag changes accurately, and
so lockdep believes interrupts are enabled when they are not (and
vice-versa). We can shuffle things so to make this more accurate. For
kernel->user transitions there are a number of constraints we need to
consider:
1) When we call __context_tracking_user_enter() HW IRQs must be disabled
and lockdep must be up-to-date with this.
2) Userspace should be treated as having IRQs enabled from the PoV of
both lockdep and tracing.
3) As context_tracking_user_enter() stops RCU from watching, we cannot
use RCU after calling it.
4) IRQ flag tracing and lockdep have state that must be manipulated
before RCU is disabled.
... with similar constraints applying for user->kernel transitions, with
the ordering reversed.
The generic entry code has enter_from_user_mode() and
exit_to_user_mode() helpers to handle this. We can't use those directly,
so we add arm64 copies for now (without the instrumentation markers
which aren't used on arm64). These replace the existing user_exit() and
user_exit_irqoff() calls spread throughout handlers, and the exception
unmasking is left as-is.
Note that:
* The accounting for debug exceptions from userspace now happens in
el0_dbg() and ret_to_user(), so this is removed from
debug_exception_enter() and debug_exception_exit(). As
user_exit_irqoff() wakes RCU, the userspace-specific check is removed.
* The accounting for syscalls now happens in el0_svc(),
el0_svc_compat(), and ret_to_user(), so this is removed from
el0_svc_common(). This does not adversely affect the workaround for
erratum 1463225, as this does not depend on any of the state tracking.
* In ret_to_user() we mask interrupts with local_daif_mask(), and so we
need to inform lockdep and tracing. Here a trace_hardirqs_off() is
sufficient and safe as we have not yet exited kernel context and RCU
is usable.
* As PROVE_LOCKING selects TRACE_IRQFLAGS, the ifdeferry in entry.S only
needs to check for the latter.
* EL0 SError handling will be dealt with in a subsequent patch, as this
needs to be treated as an NMI.
Prior to this patch, booting an appropriately-configured kernel would
result in spats as below:
| DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())
| WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5280 check_flags.part.54+0x1dc/0x1f0
| Modules linked in:
| CPU: 2 PID: 1 Comm: init Not tainted 5.10.0-rc3 #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 804003c5 (Nzcv DAIF +PAN -UAO -TCO BTYPE=--)
| pc : check_flags.part.54+0x1dc/0x1f0
| lr : check_flags.part.54+0x1dc/0x1f0
| sp : ffff80001003bd80
| x29: ffff80001003bd80 x28: ffff66ce801e0000
| x27: 00000000ffffffff x26: 00000000000003c0
| x25: 0000000000000000 x24: ffffc31842527258
| x23: ffffc31842491368 x22: ffffc3184282d000
| x21: 0000000000000000 x20: 0000000000000001
| x19: ffffc318432ce000 x18: 0080000000000000
| x17: 0000000000000000 x16: ffffc31840f18a78
| x15: 0000000000000001 x14: ffffc3184285c810
| x13: 0000000000000001 x12: 0000000000000000
| x11: ffffc318415857a0 x10: ffffc318406614c0
| x9 : ffffc318415857a0 x8 : ffffc31841f1d000
| x7 : 647261685f706564 x6 : ffffc3183ff7c66c
| x5 : ffff66ce801e0000 x4 : 0000000000000000
| x3 : ffffc3183fe00000 x2 : ffffc31841500000
| x1 : e956dc24146b3500 x0 : 0000000000000000
| Call trace:
| check_flags.part.54+0x1dc/0x1f0
| lock_is_held_type+0x10c/0x188
| rcu_read_lock_sched_held+0x70/0x98
| __context_tracking_enter+0x310/0x350
| context_tracking_enter.part.3+0x5c/0xc8
| context_tracking_user_enter+0x6c/0x80
| finish_ret_to_user+0x2c/0x13cr
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-8-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for reworking the EL1 irq/nmi entry code, move the
existing logic to C. We no longer need the asm_nmi_enter() and
asm_nmi_exit() wrappers, so these are removed. The new C functions are
marked noinstr, which prevents compiler instrumentation and runtime
probing.
In subsequent patches we'll want the new C helpers to be called in all
cases, so we don't bother wrapping the calls with ifdeferry. Even when
the new C functions are stubs the trivial calls are unlikely to have a
measurable impact on the IRQ or NMI paths anyway.
Prototypes are added to <asm/exception.h> as otherwise (in some
configurations) GCC will complain about the lack of a forward
declaration. We already do this for existing function, e.g.
enter_from_user_mode().
The new helpers are marked as noinstr (which prevents all
instrumentation, tracing, and kprobes). Otherwise, there should be no
functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-7-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In a subsequent patch ret_to_user will need to make a C function call
(in some configurations) which may clobber x0-x18 at the start of the
finish_ret_to_user block, before enable_step_tsk consumes the flags
loaded into x1.
In preparation for this, let's load the flags into x19, which is
preserved across C function calls. This avoids a redundant reload of the
flags and ensures we operate on a consistent shapshot regardless.
There should be no functional change as a result of this patch. At this
point of the entry/exit paths we only need to preserve x28 (tsk) and the
sp, and x19 is free for this use.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-6-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In later patches we'll want to extend enter_from_user_mode() and add a
corresponding exit_to_user_mode(). As these will be common for all
entries/exits from userspace, it'd be better for these to live in
entry-common.c with the rest of the entry logic.
This patch moves enter_from_user_mode() into entry-common.c. As with
other functions in entry-common.c it is marked as noinstr (which
prevents all instrumentation, tracing, and kprobes) but there are no
other functional changes.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Functions in entry-common.c are marked as notrace and NOKPROBE_SYMBOL(),
but they're still subject to other instrumentation which may rely on
lockdep/rcu/context-tracking being up-to-date, and may cause nested
exceptions (e.g. for WARN/BUG or KASAN's use of BRK) which will corrupt
exceptions registers which have not yet been read.
Prevent this by marking all functions in entry-common.c as noinstr to
prevent compiler instrumentation. This also blacklists the functions for
tracing and kprobes, so we don't need to handle that separately.
Functions elsewhere will be dealt with in subsequent patches.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-4-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Core code disables RCU when calling arch_cpu_idle(), so it's not safe
for arch_cpu_idle() or its calees to be instrumented, as the
instrumentation callbacks may attempt to use RCU or other features which
are unsafe to use in this context.
Mark them noinstr to prevent issues.
The use of local_irq_enable() in arch_cpu_idle() is similarly
problematic, and the "sched/idle: Fix arch_cpu_idle() vs tracing" patch
queued in the tip tree addresses that case.
Reported-by: Marco Elver <elver@google.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-3-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
In el0_svc_common() we unmask exceptions before we call user_exit(), and
so there's a window where an IRQ or debug exception can be taken while
RCU is not watching. In do_debug_exception() we account for this in via
debug_exception_{enter,exit}(), but in the el1_irq asm we do not and we
call trace functions which rely on RCU before we have a guarantee that
RCU is watching.
Let's avoid this by having el0_svc_common() exit userspace before
unmasking exceptions, matching what we do for all other EL0 entry paths.
We can use user_exit_irqoff() to avoid the pointless save/restore of IRQ
flags while we're sure exceptions are masked in DAIF.
The workaround for Cortex-A76 erratum 1463225 may trigger a debug
exception before this point, but the debug code invoked in this case is
safe even when RCU is not watching.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201130115950.22492-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
idle path. Similar to the entry path the low level idle functions have to
be non-instrumentable.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/DpAUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXSLD/9klc0YimnEnROW6Q5Svb2IcyIutmXF
bOIY1bYYoKILOBj3wyvDUhmdMuq5zh7H9yG11hO8MaVVWVQcLcOMLdHTYm9dcdmF
xQk33+xqjuhRShB+nEmC9ayYtWogtH6W6uZ6WDtF9ZltMKU85n5ddGJ/Fvo+HoCb
NbOdHGJdJ3/3ZCeHnxOnxM+5/GwjkBuccTV/tXmb3yXrfU9DBySyQ4/UchcpF43w
LcEb0kiQbpZsBTByKJOQV8+RR654S0sILlvRwVXpmj94vrgGwhlVk1/9rz7tkOhF
ksoo1mTVu75LMt22G/hXxE63787yRvFdHjapf0+kCOAuhl992NK+xlGDH8o9DXcu
9y73D4bI0HnDFs20w6vs20iLvxECJiYHJqlgR5ZwFUToceaNgtiYr8kzuD7Zbae1
KG2E7BuNSwHWMtf97fGn44GZknPEOaKdDn4Wv6/bvKHxLm77qe11RKF70Stcz2AI
am13KmQzzsHGF5qNWwpElRUxSdxfJMR66RnOdTQULGrRedaZTFol/y2pnVzTSe3k
SZnlpL5kE7y92UYDogPb5wWA7b+YkJN0OdSkRFy1FH26ZG8E4M7ZJ2tql5Sw7pGM
lsTjXpAUphnK5rz7QcYE8KAZWj//fIAcElIrvdklVcBnS3IqjfksYW27B64133vx
cT1B/lA1PHXj6Q==
=raED
-----END PGP SIGNATURE-----
Merge tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Thomas Gleixner:
"Two more places which invoke tracing from RCU disabled regions in the
idle path.
Similar to the entry path the low level idle functions have to be
non-instrumentable"
* tag 'locking-urgent-2020-11-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
intel_idle: Fix intel_idle() vs tracing
sched/idle: Fix arch_cpu_idle() vs tracing
Our Meltdown mitigation state isn't exposed outside of cpufeature.c,
contrary to the rest of the Spectre mitigation state. As we are going
to use it in KVM, expose a arm64_get_meltdown_state() helper which
returns the same possible values as arm64_get_spectre_v?_state().
Signed-off-by: Marc Zyngier <maz@kernel.org>
We currently try to emit *.init.rodata.* twice, once in INIT_DATA, and once
in the line immediately following it. As the two section definitions are
identical, the latter is redundant and can be dropped.
This patch drops the redundant *.init.rodata.* section definition.
Signed-off-by: Youling Tang <tangyouling@loongson.cn>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/1605750340-910-1-git-send-email-tangyouling@loongson.cn
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Provide support for additional kernel command line parameters to be
concatenated onto the end of the command line provided by the
bootloader. Additional parameters are specified in the CONFIG_CMDLINE
option when CONFIG_CMDLINE_EXTEND is selected, matching other
architectures and leveraging existing support in the FDT and EFI stub
code.
Special care must be taken for the arch-specific nokaslr parsing. Search
the bootargs FDT property and the CONFIG_CMDLINE when
CONFIG_CMDLINE_EXTEND is in use.
There are a couple of known use cases for this feature:
1) Switching between stable and development kernel versions, where one
of the versions benefits from additional command line parameters,
such as debugging options.
2) Specifying additional command line parameters, for additional tuning
or debugging, when the bootloader does not offer an interactive mode.
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200921191557.350256-3-tyhicks@linux.microsoft.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Don't ask for *the* command line string to search for "nokaslr" in
kaslr_early_init(). Instead, tell a helper function to search all the
appropriate command line strings for "nokaslr" and return the result.
This paves the way for searching multiple command line strings without
having to concatenate the strings in early init.
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Link: https://lore.kernel.org/r/20200921191557.350256-2-tyhicks@linux.microsoft.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Directly using the kimage_voffset variable is fine for now, but
will become more problematic as we start distrusting EL1.
Instead, patch the kimage_voffset into the HYP text, ensuring
we don't have to load an untrusted value later on.
Signed-off-by: Marc Zyngier <maz@kernel.org>
With the recent feature added to enable perf events to use pseudo NMIs
as interrupts on platforms which support GICv3 or later, its now been
possible to enable hard lockup detector (or NMI watchdog) on arm64
platforms. So enable corresponding support.
One thing to note here is that normally lockup detector is initialized
just after the early initcalls but PMU on arm64 comes up much later as
device_initcall(). So we need to re-initialize lockup detection once
PMU has been initialized.
Signed-off-by: Sumit Garg <sumit.garg@linaro.org>
Acked-by: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/r/1602060704-10921-1-git-send-email-sumit.garg@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.
(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)
Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
Task scheduler behavior depends on frequency invariance (FI) support and
the resulting invariant load tracking signals. For example, in order to
make accurate predictions across CPUs for all performance states, Energy
Aware Scheduling (EAS) needs frequency-invariant load tracking signals
and therefore it has a direct dependency on FI. This dependency is known,
but EAS enablement is not yet conditioned on the presence of FI during
the built of the scheduling domain hierarchy.
Before this is done, the following must be considered: while
arch_scale_freq_invariant() will see changes in FI support and could
be used to condition the use of EAS, it could return different values
during system initialisation.
For arm64, such a scenario will happen for a system that does not support
cpufreq driven FI, but does support counter-driven FI. For such a system,
arch_scale_freq_invariant() will return false if called before counter
based FI initialisation, but change its status to true after it.
If EAS becomes explicitly dependent on FI this would affect the task
scheduler behavior which builds its scheduling domain hierarchy well
before the late counter-based FI init. During that process, EAS would be
disabled due to its dependency on FI.
Two points of future early calls to arch_scale_freq_invariant() which
would determine EAS enablement are:
- (1) drivers/base/arch_topology.c:126 <<update_topology_flags_workfn>>
rebuild_sched_domains();
This will happen after CPU capacity initialisation.
- (2) kernel/sched/cpufreq_schedutil.c:917 <<rebuild_sd_workfn>>
rebuild_sched_domains_energy();
-->rebuild_sched_domains();
This will happen during sched_cpufreq_governor_change() for the
schedutil cpufreq governor.
Therefore, before enforcing the presence of FI support for the use of EAS,
ensure the following: if there is a change in FI support status after
counter init, use the existing rebuild_sched_domains_energy() function to
trigger a rebuild of the scheduling and performance domains that in turn
will determine the enablement of EAS.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lkml.kernel.org/r/20201027180713.7642-3-ionela.voinescu@arm.com
We don't need to check for MTE support before checking the flag
because it can only be set if the hardware supports MTE. As a result
we can unconditionally check the flag bit which is expected to be in
a register and therefore the check can be done in a single instruction
instead of first needing to load the hwcaps.
On a DragonBoard 845c with a kernel built with CONFIG_ARM64_MTE=y with
the powersave governor this reduces the cost of a kernel entry/exit
(invalid syscall) from 465.1ns to 463.8ns.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/If4dc3501fd4e4f287322f17805509613cfe47d24
Link: https://lore.kernel.org/r/20201118032051.1405907-1-pcc@google.com
[catalin.marinas@arm.com: remove IS_ENABLED(CONFIG_ARM64_MTE)]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Even though support for EFI boot remains entirely optional for arm64,
it is unlikely that we will ever be able to repurpose the image header
fields that the EFI loader relies on, i.e., the magic NOP at offset
0x0 and the PE header address at offset 0x3c.
So let's factor out the differences into a 'efi_signature_nop' macro and
a local symbol representing the PE header address, and move the
conditional definitions into efi-header.S, taking into account whether
CONFIG_EFI is enabled or not. While at it, switch to a signature NOP
that behaves more like a NOP, i.e., one that only clobbers the
flags.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201117124729.12642-4-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We no longer map the first 64 KB of the kernel image, as there is nothing
there that we ever need to refer back to once the kernel has booted. Even
though facilities like kallsyms are very careful to only refer to the
region that starts at _stext when mapping virtual addresses to symbol
names, let's avoid any confusion by switching to local .L prefixed symbol
names for the EFI header, as none of them have any significance to the
rest of the kernel.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201117124729.12642-3-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In a previous patch, we increased the size of the EFI PE/COFF header
to 64 KB, which resulted in the _stext symbol to appear at a fixed
offset of 64 KB into the image.
Since 64 KB is also the largest page size we support, this completely
removes the need to map the first 64 KB of the kernel image, given that
it only contains the arm64 Image header and the EFI header, neither of
which we ever access again after booting the kernel. More importantly,
we should avoid an executable mapping of non-executable and not entirely
predictable data, to deal with the unlikely event that we inadvertently
emitted something that looks like an opcode that could be used as a
gadget for speculative execution.
So let's limit the kernel mapping of .text to the [_stext, _etext)
region, which matches the view of generic code (such as kallsyms) when
it reasons about the boundaries of the kernel's .text section.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201117124729.12642-2-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The spectre-v3a mitigation is split between cpu_errata.c and spectre.c,
with the former handling detection of the problem and the latter handling
enabling of the workaround.
Move the detection logic alongside the enabling logic, like we do for the
other spectre mitigations.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-10-will@kernel.org
Since ARM64_HARDEN_EL2_VECTORS is really a mitigation for Spectre-v3a,
rename it accordingly for consistency with the v2 and v4 mitigation.
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-9-will@kernel.org
The EL2 vectors installed when a guest is running point at one of the
following configurations for a given CPU:
- Straight at __kvm_hyp_vector
- A trampoline containing an SMC sequence to mitigate Spectre-v2 and
then a direct branch to __kvm_hyp_vector
- A dynamically-allocated trampoline which has an indirect branch to
__kvm_hyp_vector
- A dynamically-allocated trampoline containing an SMC sequence to
mitigate Spectre-v2 and then an indirect branch to __kvm_hyp_vector
The indirect branches mean that VA randomization at EL2 isn't trivially
bypassable using Spectre-v3a (where the vector base is readable by the
guest).
Rather than populate these vectors dynamically, configure everything
statically and use an enumerated type to identify the vector "slot"
corresponding to one of the configurations above. This both simplifies
the code, but also makes it much easier to implement at EL2 later on.
Signed-off-by: Will Deacon <will@kernel.org>
[maz: fixed double call to kvm_init_vector_slots() on nVHE]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20201113113847.21619-8-will@kernel.org
- A set of commits which reduce the stack usage of various perf event
handling functions which allocated large data structs on stack causing
stack overflows in the worst case.
- Use the proper mechanism for detecting soft interrupts in the recursion
protection.
- Make the resursion protection simpler and more robust.
- Simplify the scheduling of event groups to make the code more robust and
prepare for fixing the issues vs. scheduling of exclusive event groups.
- Prevent event multiplexing and rotation for exclusive event groups
- Correct the perf event attribute exclusive semantics to take pinned
events, e.g. the PMU watchdog, into account
- Make the anythread filtering conditional for Intel's generic PMU
counters as it is not longer guaranteed to be supported on newer
CPUs. Check the corresponding CPUID leaf to make sure.
- Fixup a duplicate initialization in an array which was probably cause by
the usual copy & paste - forgot to edit mishap.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+xIi0THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYofixD/4+4gc8DhOmAkMrN0Z9tiW8ebgMKmb9
wZRkMr5Osi0GzLJOPZ6SdY6jd0A3rMN/sW6P1DT6pDtcty4bKFoW5VZBuUDIAhel
BC4C93L3y1En/GEZu1GTy3LvsBwLBQTOoY4goDjbdAbk60S/0RTHOGyQsRsOQFe6
fVs3iXozAFuaR6I6N3dlxuJAE51zvr8MyBWaUoByNDB//1+lLNW+JfClaAOG1oXx
qZIg/niatBVGzSGgKNRUyh3g8G1HJtabsA/NZ4PH8ZHuYABfmj4lmmUPR77ICLfV
wMITEBG7eaktB8EqM9hvaoOZLA5kpXHO2JbCFSs4c4x11mlC8g7QMV3poCw33YoN
a5TmT1A3muri1riy1/Ee9lXACOq7/tf2+Xfn9o6dvDdBwd6s5pzlhLGR8gILp2lF
2bcg3IwYvHT/Kiurb/WGNpbCqQIPJpcUcfs3tNBCCtKegahUQNnGjxN3NVo9RCit
zfL6xIJ8eZiYnsxXx4NKm744AukWiql3aRNgRkOdBP5WC68xt6VLcxG1YZKUoDhy
jRSOCD/DuPSMSvAAgN7S8OWlPsKWBxVxxWYV+K8FpwhgzbQ3WbS3UDiYkhgjeOxu
OlM692oWpllKvQWlvYthr2Be6oPCRRi1vvADNNbTKzgHk5i61bwympsGl1EZx3Pz
2ROp7NJFRESnqw==
=FzCf
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A set of fixes for perf:
- A set of commits which reduce the stack usage of various perf
event handling functions which allocated large data structs on
stack causing stack overflows in the worst case
- Use the proper mechanism for detecting soft interrupts in the
recursion protection
- Make the resursion protection simpler and more robust
- Simplify the scheduling of event groups to make the code more
robust and prepare for fixing the issues vs. scheduling of
exclusive event groups
- Prevent event multiplexing and rotation for exclusive event groups
- Correct the perf event attribute exclusive semantics to take
pinned events, e.g. the PMU watchdog, into account
- Make the anythread filtering conditional for Intel's generic PMU
counters as it is not longer guaranteed to be supported on newer
CPUs. Check the corresponding CPUID leaf to make sure
- Fixup a duplicate initialization in an array which was probably
caused by the usual 'copy & paste - forgot to edit' mishap"
* tag 'perf-urgent-2020-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel/uncore: Fix Add BW copypasta
perf/x86/intel: Make anythread filter support conditional
perf: Tweak perf_event_attr::exclusive semantics
perf: Fix event multiplexing for exclusive groups
perf: Simplify group_sched_in()
perf: Simplify group_sched_out()
perf/x86: Make dummy_iregs static
perf/arch: Remove perf_sample_data::regs_user_copy
perf: Optimize get_recursion_context()
perf: Fix get_recursion_context()
perf/x86: Reduce stack usage for x86_pmu::drain_pebs()
perf: Reduce stack usage of perf_output_begin()
Given that smp_call_function_single() can deadlock when interrupts are
disabled, abort the SMP call if irqs_disabled(). This scenario is
currently not possible given the function's uses, but safeguard this for
potential future uses.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20201113155328.4194-1-ionela.voinescu@arm.com
[catalin.marinas@arm.com: modified following Mark's comment]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
If Activity Monitors (AMUs) are present, two of the counters can be used
to implement support for CPPC's (Collaborative Processor Performance
Control) delivered and reference performance monitoring functionality
using FFH (Functional Fixed Hardware).
Given that counters for a certain CPU can only be read from that CPU,
while FFH operations can be called from any CPU for any of the CPUs, use
smp_call_function_single() to provide the requested values.
Therefore, depending on the register addresses, the following values
are returned:
- 0x0 (DeliveredPerformanceCounterRegister): AMU core counter
- 0x1 (ReferencePerformanceCounterRegister): AMU constant counter
The use of Activity Monitors is hidden behind the generic
cpu_read_{corecnt,constcnt}() functions.
Read functionality for these two registers represents the only current
FFH support for CPPC. Read operations for other register values or write
operation for all registers are unsupported. Therefore, keep CPPC's FFH
unsupported if no CPUs have valid AMU frequency counters. For this
purpose, the get_cpu_with_amu_feat() is introduced.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-4-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In order for the counter validation function to be reused, split
validate_cpu_freq_invariance_counters() into:
- freq_counters_valid(cpu) - check cpu for valid cycle counters
- freq_inv_set_max_ratio(int cpu, u64 max_rate, u64 ref_rate) -
generic function that sets the normalization ratio used by
topology_scale_freq_tick()
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-3-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
In preparation for other uses of Activity Monitors (AMU) cycle counters,
place counter read functionality in generic functions that can reused:
read_corecnt() and read_constcnt().
As a result, implement update_freq_counters_refs() to replace
init_cpu_freq_invariance_counters() and both initialise and update
the per-cpu reference variables.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201106125334.21570-2-ionela.voinescu@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
- Spectre/Meltdown safelisting for some Qualcomm KRYO cores
- Fix RCU splat when failing to online a CPU due to a feature mismatch
- Fix a recently introduced sparse warning in kexec()
- Fix handling of CPU erratum 1418040 for late CPUs
- Ensure hot-added memory falls within linear-mapped region
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+ubogQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPD7B/9i5ao44AEJwjz0a68S/jD7kUD7i3xVkCNN
Y8i/i9mx44IAcf8pmyQh3ngaFlJuF2C6oC/SQFiDbmVeGeZXLXvXV7uGAqXG5Xjm
O2Svgr1ry176JWpsB7MNnZwzAatQffdkDjbjQCcUnUIKYcLvge8H2fICljujGcfQ
094vNmT9VerTWRbWDti3Ck/ug+sanVHuzk5BWdKx3jamjeTqo+sBZK/wgBr6UoYQ
mT3BFX42kLIGg+AzwXRDPlzkJymjYgQDbSwGsvny8qKdOEJbAUwWXYZ5sTs9J/gU
E9PT3VJI7BYtTd1uPEWkD645U3arfx3Pf2JcJlbkEp86qx4CUF9s
=T6k4
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Spectre/Meltdown safelisting for some Qualcomm KRYO cores
- Fix RCU splat when failing to online a CPU due to a feature mismatch
- Fix a recently introduced sparse warning in kexec()
- Fix handling of CPU erratum 1418040 for late CPUs
- Ensure hot-added memory falls within linear-mapped region
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpu_errata: Apply Erratum 845719 to KRYO2XX Silver
arm64: proton-pack: Add KRYO2XX silver CPUs to spectre-v2 safe-list
arm64: kpti: Add KRYO2XX gold/silver CPU cores to kpti safelist
arm64: Add MIDR value for KRYO2XX gold/silver CPU cores
arm64/mm: Validate hotplug range before creating linear mapping
arm64: smp: Tell RCU about CPUs that fail to come online
arm64: psci: Avoid printing in cpu_psci_cpu_die()
arm64: kexec_file: Fix sparse warning
arm64: errata: Fix handling of 1418040 with late CPU onlining
QCOM KRYO2XX Silver cores are Cortex-A53 based and are
susceptible to the 845719 erratum. Add them to the lookup
list to apply the erratum.
Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20201104232218.198800-5-konrad.dybcio@somainline.org
Signed-off-by: Will Deacon <will@kernel.org>
QCOM KRYO2XX gold (big) silver (LITTLE) CPU cores are based on
Cortex-A73 and Cortex-A53 respectively and are meltdown safe,
hence add them to kpti_safe_list[].
Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Link: https://lore.kernel.org/r/20201104232218.198800-3-konrad.dybcio@somainline.org
Signed-off-by: Will Deacon <will@kernel.org>
Mapping between IPI type index and its string is direct without requiring
an additional offset. Hence the existing macro S(x, s) is now redundant
and can just be dropped. This also makes the code clean and simple.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Link: https://lore.kernel.org/r/1604921916-23368-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Depending on configuration options and specific code paths, we either
use the empty_zero_page or the configuration-dependent reserved_ttbr0
as a reserved value for TTBR{0,1}_EL1.
To simplify this code, let's always allocate and use the same
reserved_pg_dir, replacing reserved_ttbr0. Note that this is allocated
(and hence pre-zeroed), and is also marked as read-only in the kernel
Image mapping.
Keeping this separate from the empty_zero_page potentially helps with
robustness as the empty_zero_page is used in a number of cases where a
failure to map it read-only could allow it to become corrupted.
The (presently unused) swapper_pg_end symbol is also removed, and
comments are added wherever we rely on the offsets between the
pre-allocated pg_dirs to keep these cases easily identifiable.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201103102229.8542-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The kprobe_step_ctx (kcb->ss_ctx) has ss_pending and match_addr, but
those are redundant because those can be replaced by KPROBE_HIT_SS and
&cur_kprobe->ainsn.api.insn[1] respectively.
To simplify the code, remove the kprobe_step_ctx.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201103134900.337243-2-jean-philippe@linaro.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Commit ce3d31ad3c ("arm64/smp: Move rcu_cpu_starting() earlier") ensured
that RCU is informed early about incoming CPUs that might end up calling
into printk() before they are online. However, if such a CPU fails the
early CPU feature compatibility checks in check_local_cpu_capabilities(),
then it will be powered off or parked without informing RCU, leading to
an endless stream of stalls:
| rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
| rcu: 2-O...: (0 ticks this GP) idle=002/1/0x4000000000000000 softirq=0/0 fqs=2593
| (detected by 0, t=5252 jiffies, g=9317, q=136)
| Task dump for CPU 2:
| task:swapper/2 state:R running task stack: 0 pid: 0 ppid: 1 flags:0x00000028
| Call trace:
| ret_from_fork+0x0/0x30
Ensure that the dying CPU invokes rcu_report_dead() prior to being powered
off or parked.
Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Suggested-by: Qian Cai <cai@redhat.com>
Link: https://lore.kernel.org/r/20201105222242.GA8842@willie-the-truck
Link: https://lore.kernel.org/r/20201106103602.9849-3-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
cpu_psci_cpu_die() is called in the context of the dying CPU, which
will no longer be online or tracked by RCU. It is therefore not generally
safe to call printk() if the PSCI "cpu off" request fails, so remove the
pr_crit() invocation.
Cc: Qian Cai <cai@redhat.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20201106103602.9849-2-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Sparse gets cross about us returning 0 from image_load(), which has a
return type of 'void *':
>> arch/arm64/kernel/kexec_image.c:130:16: sparse: sparse: Using plain integer as NULL pointer
Return NULL instead, as we don't use the return value for anything if it
does not indicate an error.
Cc: Benjamin Gwin <bgwin@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 108aa50365 ("arm64: kexec_file: try more regions if loading segments fails")
Link: https://lore.kernel.org/r/202011091736.T0zH8kaC-lkp@intel.com
Signed-off-by: Will Deacon <will@kernel.org>
In a surprising turn of events, it transpires that CPU capabilities
configured as ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE are never set as the
result of late-onlining. Therefore our handling of erratum 1418040 does
not get activated if it is not required by any of the boot CPUs, even
though we allow late-onlining of an affected CPU.
In order to get things working again, replace the cpus_have_const_cap()
invocation with an explicit check for the current CPU using
this_cpu_has_cap().
Cc: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201106114952.10032-1-will@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
When building with LTO, there is an increased risk of the compiler
converting an address dependency headed by a READ_ONCE() invocation
into a control dependency and consequently allowing for harmful
reordering by the CPU.
Ensure that such transformations are harmless by overriding the generic
READ_ONCE() definition with one that provides acquire semantics when
building with LTO.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
In preparation for patching the internals of READ_ONCE() itself, replace
its usage on the alternatives patching patch with a volatile variable
instead.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Armv8.3 introduced the LDAPR instruction, which provides weaker memory
ordering semantics than LDARi (RCpc vs RCsc). Generally, we provide an
RCsc implementation when implementing the Linux memory model, but LDAPR
can be used as a useful alternative to dependency ordering, particularly
when the compiler is capable of breaking the dependencies.
Since LDAPR is not available on all CPUs, add a cpufeature to detect it at
runtime and allow the instruction to be used with alternative code
patching.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
asm/alternative.h contains both the macros needed to use alternatives,
as well the type definitions and function prototypes for applying them.
Split the header in two, so that alternatives can be used from core
header files such as linux/compiler.h without the risk of circular
includes
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
struct perf_sample_data lives on-stack, we should be careful about it's
size. Furthermore, the pt_regs copy in there is only because x86_64 is a
trainwreck, solve it differently.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lkml.kernel.org/r/20201030151955.258178461@infradead.org
Now that we can use function pointer, use a dispatch table to call
the individual HVC handlers, leading to more maintainable code.
Further improvements include helpers to declare the mapping of
local variables to values passed in the host context.
Reviewed-by: Alexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
- Fix early use of kprobes
- Fix kernel placement in kexec_file_load()
- Bump maximum number of NUMA nodes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+lLeQQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNHgjB/9RJMWwEo6TJ0JyJdBmgEy9k+F5k7zUEtNO
dmZBXt1V8Gvw2MLRKAayWLumoJCUf0ZTICJ9+wnYAKkGtKvDfuEofrEOe/W/jB8m
V2Nm7Y+UWL/D0E5+jdyGIqsPiljaZg8GCyOxN6BDuqgl/T8/3YlpSudMvlr7xm8s
F71k2u2EvSybcRFmtp9A5x0eUeWRSQtLa1+fWmpyAPAX64YJ9bh2w3/g5SecocUK
Ra8H91XO5BT2sHsDDQe67iUfZz9Y1N1UbNiuzCZIL7+xTcQ6DKw4JJ/2Z5BfkH0D
04THZZqYt5AjYQmUULMmPcbSzMp4E30s5dmckevq8E+LG0imLDYp
=w7Ip
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here's the weekly batch of fixes for arm64. Not an awful lot here, but
there are still a few unresolved issues relating to CPU hotplug, RCU
and IRQ tracing that I hope to queue fixes for next week.
Summary:
- Fix early use of kprobes
- Fix kernel placement in kexec_file_load()
- Bump maximum number of NUMA nodes"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: kexec_file: try more regions if loading segments fails
arm64: kprobes: Use BRK instead of single-step when executing instructions out-of-line
arm64: NUMA: Kconfig: Increase NODES_SHIFT to 4
It's possible that the first region picked for the new kernel will make
it impossible to fit the other segments in the required 32GB window,
especially if we have a very large initrd.
Instead of giving up, we can keep testing other regions for the kernel
until we find one that works.
Suggested-by: Ryan O'Leary <ryanoleary@google.com>
Signed-off-by: Benjamin Gwin <bgwin@google.com>
Link: https://lore.kernel.org/r/20201103201106.2397844-1-bgwin@google.com
Signed-off-by: Will Deacon <will@kernel.org>
Commit 36dadef23f ("kprobes: Init kprobes in early_initcall") enabled
using kprobes from early_initcall. Unfortunately at this point the
hardware debug infrastructure is not operational. The OS lock may still
be locked, and the hardware watchpoints may have unknown values when
kprobe enables debug monitors to single-step instructions.
Rather than using hardware single-step, append a BRK instruction after
the instruction to be executed out-of-line.
Fixes: 36dadef23f ("kprobes: Init kprobes in early_initcall")
Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20201103134900.337243-1-jean-philippe@linaro.org
Signed-off-by: Will Deacon <will@kernel.org>
* selftest fix
* Force PTE mapping on device pages provided via VFIO
* Fix detection of cacheable mapping at S2
* Fallback to PMD/PTE mappings for composite huge pages
* Fix accounting of Stage-2 PGD allocation
* Fix AArch32 handling of some of the debug registers
* Simplify host HYP entry
* Fix stray pointer conversion on nVHE TLB invalidation
* Fix initialization of the nVHE code
* Simplify handling of capabilities exposed to HYP
* Nuke VCPUs caught using a forbidden AArch32 EL0
x86:
* new nested virtualization selftest
* Miscellaneous fixes
* make W=1 fixes
* Reserve new CPUID bit in the KVM leaves
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+dhRAUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPWCgf/U997UW/11IdNtkehQO/DFdx7lHev
+IahN1Pnbt92ZoR5nGhK9pgvDahIVhqTmUvgV+3fD24OnqXTpYTu1fliBvL6ynbN
J9Ycf0zFAgwfgTTD5UexTlEovnhX4xz7NDmd6rpxGDZdMaBHQFPkCXBFK45pf4nd
O349aHV0X1AA7Tt/sLhpXpi74Vake1xErLHKhIVLHKyo/zDm+Q0UZry068NNBzTr
St3+QSGlFXhuekVrZLh+DShh6rZGLyY9tcySt6o0Jk7fSs1lmEnPbBgeeqYmyHMd
Yn+ybhthmNkkpI8so70TA9roiVar4UmjnMBOiav62bo7ue26pKE5cWQyXw==
=mvBr
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- selftest fix
- force PTE mapping on device pages provided via VFIO
- fix detection of cacheable mapping at S2
- fallback to PMD/PTE mappings for composite huge pages
- fix accounting of Stage-2 PGD allocation
- fix AArch32 handling of some of the debug registers
- simplify host HYP entry
- fix stray pointer conversion on nVHE TLB invalidation
- fix initialization of the nVHE code
- simplify handling of capabilities exposed to HYP
- nuke VCPUs caught using a forbidden AArch32 EL0
x86:
- new nested virtualization selftest
- miscellaneous fixes
- make W=1 fixes
- reserve new CPUID bit in the KVM leaves"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: vmx: remove unused variable
KVM: selftests: Don't require THP to run tests
KVM: VMX: eVMCS: make evmcs_sanitize_exec_ctrls() work again
KVM: selftests: test behavior of unmapped L2 APIC-access address
KVM: x86: Fix NULL dereference at kvm_msr_ignored_check()
KVM: x86: replace static const variables with macros
KVM: arm64: Handle Asymmetric AArch32 systems
arm64: cpufeature: upgrade hyp caps to final
arm64: cpufeature: reorder cpus_have_{const, final}_cap()
KVM: arm64: Factor out is_{vhe,nvhe}_hyp_code()
KVM: arm64: Force PTE mapping on fault resulting in a device mapping
KVM: arm64: Use fallback mapping sizes for contiguous huge page sizes
KVM: arm64: Fix masks in stage2_pte_cacheable()
KVM: arm64: Fix AArch32 handling of DBGD{CCINT,SCRext} and DBGVCR
KVM: arm64: Allocate stage-2 pgd pages with GFP_KERNEL_ACCOUNT
KVM: arm64: Drop useless PAN setting on host EL1 to EL2 transition
KVM: arm64: Remove leftover kern_hyp_va() in nVHE TLB invalidation
KVM: arm64: Don't corrupt tpidr_el2 on failed HVC call
x86/kvm: Reserve KVM_FEATURE_MSI_EXT_DEST_ID
- Force PTE mapping on device pages provided via VFIO
- Fix detection of cacheable mapping at S2
- Fallback to PMD/PTE mappings for composite huge pages
- Fix accounting of Stage-2 PGD allocation
- Fix AArch32 handling of some of the debug registers
- Simplify host HYP entry
- Fix stray pointer conversion on nVHE TLB invalidation
- Fix initialization of the nVHE code
- Simplify handling of capabilities exposed to HYP
- Nuke VCPUs caught using a forbidden AArch32 EL0
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAl+cO3oPHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDJdoP/jiKYR8iVkq/RmIsQl383KwQiJGTMi0iL2Zw
/tHnf8bKowAPyG8bqyXMJqlWOb7tcp6U3m+WhENAZHWH02r2M921q0DGVW5p48ou
Ek4zJnFF1iL5ryOBgROKK1nymUZOi3W1a1SsD6ZPImQsKsjNGbqKgWsGs8i9ft0P
vkNZwlqebzJp+OR3agJemc8dkXcGlcRHk7fffdMcU8jsF5RJ9zC0XU0+scKryxhV
o8PzKSlwCeisyL+Vz+s7POzoD3Rt+P+qjblz5NWqy/NHuLh+V9hzUSDOjWbZb70f
Er29vGv7Yjb4nKK2KUzNqirSfXsRylfsjGr+YibP6uKEUMuUm/V41DqzT7nMalIm
cOBGtPk6W9wOL8JNDmlyVGCfATI+5RrErQ8nFClrPu3qw4Hv4pb1Ad5OgAhNE0u1
PfUyBBtQKNAjTdVCRfSuFL4d2yegy1rrpCmYWrvdQjLlXemwgYgKnSQN98cZHgjA
foCAP5gJpAWGualyhKJx2CkY/5deeWKS39ISiNgHo5eRvKsGEnMN7j9UX77VbhRr
PkwCmeUJ3kjzaAfmtcBN/iLjwQbWypidjX2Vbfl5WoVdLuiYXFZIvsdaqRHGl56F
5zhYxM8DKODNEJKMl7a89oEFGKy8x1PQ0kqer9a6GBWkNDrQMOSL4+FkxCyM2m9g
RoHtmdy0
=gVaX
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 5.10, take #1
- Force PTE mapping on device pages provided via VFIO
- Fix detection of cacheable mapping at S2
- Fallback to PMD/PTE mappings for composite huge pages
- Fix accounting of Stage-2 PGD allocation
- Fix AArch32 handling of some of the debug registers
- Simplify host HYP entry
- Fix stray pointer conversion on nVHE TLB invalidation
- Fix initialization of the nVHE code
- Simplify handling of capabilities exposed to HYP
- Nuke VCPUs caught using a forbidden AArch32 EL0
We finalize caps before initializing kvm hyp code, and any use of
cpus_have_const_cap() in kvm hyp code generates redundant and
potentially unsound code to read the cpu_hwcaps array.
A number of helper functions used in both hyp context and regular kernel
context use cpus_have_const_cap(), as some regular kernel code runs
before the capabilities are finalized. It's tedious and error-prone to
write separate copies of these for hyp and non-hyp code.
So that we can avoid the redundant code, let's automatically upgrade
cpus_have_const_cap() to cpus_have_final_cap() when used in hyp context.
With this change, there's never a reason to access to cpu_hwcaps array
from hyp code, and we don't need to create an NVHE alias for this.
This should have no effect on non-hyp code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: David Brazdil <dbrazdil@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201026134931.28246-4-mark.rutland@arm.com
The call to rcu_cpu_starting() in secondary_start_kernel() is not early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:
WARNING: suspicious RCU usage
-----------------------------
kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
other info that might help us debug this:
RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 1
no locks held by swapper/1/0.
Call trace:
dump_backtrace+0x0/0x3c8
show_stack+0x14/0x60
dump_stack+0x14c/0x1c4
lockdep_rcu_suspicious+0x134/0x14c
__lock_acquire+0x1c30/0x2600
lock_acquire+0x274/0xc48
_raw_spin_lock+0xc8/0x140
vprintk_emit+0x90/0x3d0
vprintk_default+0x34/0x40
vprintk_func+0x378/0x590
printk+0xa8/0xd4
__cpuinfo_store_cpu+0x71c/0x868
cpuinfo_store_cpu+0x2c/0xc8
secondary_start_kernel+0x244/0x318
This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the secondary_start_kernel() function.
Signed-off-by: Qian Cai <cai@redhat.com>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Link: https://lore.kernel.org/r/20201028182614.13655-1-cai@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
On Cortex-A77 r0p0 and r1p0, a sequence of a non-cacheable or device load
and a store exclusive or PAR_EL1 read can cause a deadlock.
The workaround requires a DMB SY before and after a PAR_EL1 register
read. In addition, it's possible an interrupt (doing a device read) or
KVM guest exit could be taken between the DMB and PAR read, so we
also need a DMB before returning from interrupt and before returning to
a guest.
A deadlock is still possible with the workaround as KVM guests must also
have the workaround. IOW, a malicious guest can deadlock an affected
systems.
This workaround also depends on a firmware counterpart to enable the h/w
to insert DMB SY after load and store exclusive instructions. See the
errata document SDEN-1152370 v10 [1] for more information.
[1] https://static.docs.arm.com/101992/0010/Arm_Cortex_A77_MP074_Software_Developer_Errata_Notice_v10.pdf
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/20201028182839.166037-2-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Commit 76085aff29 ("efi/libstub/arm64: align PE/COFF sections to segment
alignment") increased the PE/COFF section alignment to match the minimum
segment alignment of the kernel image, which ensures that the kernel does
not need to be moved around in memory by the EFI stub if it was built as
relocatable.
However, the first PE/COFF section starts at _stext, which is only 4 KB
aligned, and so the section layout is inconsistent. Existing EFI loaders
seem to care little about this, but it is better to clean this up.
So let's pad the header to 64 KB to match the PE/COFF section alignment.
Fixes: 76085aff29 ("efi/libstub/arm64: align PE/COFF sections to segment alignment")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20201027073209.2897-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Now that we started making the linker warn about orphan sections
(input sections that are not explicitly consumed by an output section),
some configurations produce the following warning:
aarch64-linux-gnu-ld: warning: orphan section `.igot.plt' from
`arch/arm64/kernel/head.o' being placed in section `.igot.plt'
It could be any file that triggers this - head.o is simply the first
input file in the link - and the resulting .igot.plt section never
actually appears in vmlinux as it turns out to be empty.
So let's add .igot.plt to our collection of input sections to disregard
unless they are empty.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Jessica Yu <jeyu@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20201028133332.5571-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The icache_policy_str[] definition causes a warning when extra
warning flags are enabled:
arch/arm64/kernel/cpuinfo.c:38:26: warning: initialized field overwritten [-Woverride-init]
38 | [ICACHE_POLICY_VIPT] = "VIPT",
| ^~~~~~
arch/arm64/kernel/cpuinfo.c:38:26: note: (near initialization for 'icache_policy_str[2]')
arch/arm64/kernel/cpuinfo.c:39:26: warning: initialized field overwritten [-Woverride-init]
39 | [ICACHE_POLICY_PIPT] = "PIPT",
| ^~~~~~
arch/arm64/kernel/cpuinfo.c:39:26: note: (near initialization for 'icache_policy_str[3]')
arch/arm64/kernel/cpuinfo.c:40:27: warning: initialized field overwritten [-Woverride-init]
40 | [ICACHE_POLICY_VPIPT] = "VPIPT",
| ^~~~~~~
arch/arm64/kernel/cpuinfo.c:40:27: note: (near initialization for 'icache_policy_str[0]')
There is no real need for the default initializer here, as printing a
NULL string is harmless. Rewrite the logic to have an explicit
reserved value for the only one that uses the default value.
This partially reverts the commit that removed ICACHE_POLICY_AIVIVT.
Fixes: 155433cb36 ("arm64: cache: Remove support for ASID-tagged VIVT I-caches")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201026193807.3816388-1-arnd@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
According to the SMCCC spec[1](7.5.2 Discovery) the
ARM_SMCCC_ARCH_WORKAROUND_1 function id only returns 0, 1, and
SMCCC_RET_NOT_SUPPORTED.
0 is "workaround required and safe to call this function"
1 is "workaround not required but safe to call this function"
SMCCC_RET_NOT_SUPPORTED is "might be vulnerable or might not be, who knows, I give up!"
SMCCC_RET_NOT_SUPPORTED might as well mean "workaround required, except
calling this function may not work because it isn't implemented in some
cases". Wonderful. We map this SMC call to
0 is SPECTRE_MITIGATED
1 is SPECTRE_UNAFFECTED
SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE
For KVM hypercalls (hvc), we've implemented this function id to return
SMCCC_RET_NOT_SUPPORTED, 0, and SMCCC_RET_NOT_REQUIRED. One of those
isn't supposed to be there. Per the code we call
arm64_get_spectre_v2_state() to figure out what to return for this
feature discovery call.
0 is SPECTRE_MITIGATED
SMCCC_RET_NOT_REQUIRED is SPECTRE_UNAFFECTED
SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE
Let's clean this up so that KVM tells the guest this mapping:
0 is SPECTRE_MITIGATED
1 is SPECTRE_UNAFFECTED
SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE
Note: SMCCC_RET_NOT_AFFECTED is 1 but isn't part of the SMCCC spec
Fixes: c118bbb527 ("arm64: KVM: Propagate full Spectre v2 workaround state to KVM guests")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://developer.arm.com/documentation/den0028/latest [1]
Link: https://lore.kernel.org/r/20201023154751.1973872-1-swboyd@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
As it stands now, the vdso32 Makefile hardcodes the linker to ld.bfd
using -fuse-ld=bfd with $(CC). This was taken from the arm vDSO
Makefile, as the comment notes, done in commit d2b30cd4b7 ("ARM:
8384/1: VDSO: force use of BFD linker").
Commit fe00e50b2d ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to
link VDSO") changed that Makefile to use $(LD) directly instead of
through $(CC), which matches how the rest of the kernel operates. Since
then, LD=ld.lld means that the arm vDSO will be linked with ld.lld,
which has shown no problems so far.
Allow ld.lld to link this vDSO as we do the regular arm vDSO. To do
this, we need to do a few things:
* Add a LD_COMPAT variable, which defaults to $(CROSS_COMPILE_COMPAT)ld
with gcc and $(LD) if LLVM is 1, which will be ld.lld, or
$(CROSS_COMPILE_COMPAT)ld if not, which matches the logic of the main
Makefile. It is overrideable for further customization and avoiding
breakage.
* Eliminate cc32-ldoption, which matches commit 055efab312 ("kbuild:
drop support for cc-ldoption").
With those, we can use $(LD_COMPAT) in cmd_ldvdso and change the flags
from compiler linker flags to linker flags directly. We eliminate
-mfloat-abi=soft because it is not handled by the linker.
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/1033
Link: https://lore.kernel.org/r/20201020011406.1818918-1-natechancellor@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
PPC:
- Fix for running nested guests with in-kernel IRQ chip
- Fix race condition causing occasional host hard lockup
- Minor cleanups and bugfixes
x86:
- allow trapping unknown MSRs to userspace
- allow userspace to force #GP on specific MSRs
- INVPCID support on AMD
- nested AMD cleanup, on demand allocation of nested SVM state
- hide PV MSRs and hypercalls for features not enabled in CPUID
- new test for MSR_IA32_TSC writes from host and guest
- cleanups: MMU, CPUID, shared MSRs
- LAPIC latency optimizations ad bugfixes
For x86, also included in this pull request is a new alternative and
(in the future) more scalable implementation of extended page tables
that does not need a reverse map from guest physical addresses to
host physical addresses. For now it is disabled by default because
it is still lacking a few of the existing MMU's bells and whistles.
However it is a very solid piece of work and it is already available
for people to hammer on it.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl+S8dsUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroM40Af+M46NJmuS5rcwFfybvK/c42KT6svX
Co1NrZDwzSQ2mMy3WQzH9qeLvb+nbY4sT3n5BPNPNsT+aIDPOTDt//qJ2/Ip9UUs
tRNea0MAR96JWLE7MSeeRxnTaQIrw/AAZC0RXFzZvxcgytXwdqBExugw4im+b+dn
Dcz8QxX1EkwT+4lTm5HC0hKZAuo4apnK1QkqCq4SdD2QVJ1YE6+z7pgj4wX7xitr
STKD6q/Yt/0ndwqS0GSGbyg0jy6mE620SN6isFRkJYwqfwLJci6KnqvEK67EcNMu
qeE017K+d93yIVC46/6TfVHzLR/D1FpQ8LZ16Yl6S13OuGIfAWBkQZtPRg==
=AD6a
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"For x86, there is a new alternative and (in the future) more scalable
implementation of extended page tables that does not need a reverse
map from guest physical addresses to host physical addresses.
For now it is disabled by default because it is still lacking a few of
the existing MMU's bells and whistles. However it is a very solid
piece of work and it is already available for people to hammer on it.
Other updates:
ARM:
- New page table code for both hypervisor and guest stage-2
- Introduction of a new EL2-private host context
- Allow EL2 to have its own private per-CPU variables
- Support of PMU event filtering
- Complete rework of the Spectre mitigation
PPC:
- Fix for running nested guests with in-kernel IRQ chip
- Fix race condition causing occasional host hard lockup
- Minor cleanups and bugfixes
x86:
- allow trapping unknown MSRs to userspace
- allow userspace to force #GP on specific MSRs
- INVPCID support on AMD
- nested AMD cleanup, on demand allocation of nested SVM state
- hide PV MSRs and hypercalls for features not enabled in CPUID
- new test for MSR_IA32_TSC writes from host and guest
- cleanups: MMU, CPUID, shared MSRs
- LAPIC latency optimizations ad bugfixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
kvm: x86/mmu: NX largepage recovery for TDP MMU
kvm: x86/mmu: Don't clear write flooding count for direct roots
kvm: x86/mmu: Support MMIO in the TDP MMU
kvm: x86/mmu: Support write protection for nesting in tdp MMU
kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
kvm: x86/mmu: Support dirty logging for the TDP MMU
kvm: x86/mmu: Support changed pte notifier in tdp MMU
kvm: x86/mmu: Add access tracking for tdp_mmu
kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
kvm: x86/mmu: Add TDP MMU PF handler
kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
KVM: Cache as_id in kvm_memory_slot
kvm: x86/mmu: Add functions to handle changed TDP SPTEs
kvm: x86/mmu: Allocate and free TDP MMU roots
kvm: x86/mmu: Init / Uninit the TDP MMU
kvm: x86/mmu: Introduce tdp_iter
KVM: mmu: extract spte.h and spte.c
KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl+SOXIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgptrcD/93VUDmRAn73ChKNd0TtXUicJlAlNLVjvfs
VFTXWBDnlJnGkZT7ElkDD9b8dsz8l4xGf/QZ5dzhC/th2OsfObQkSTfe0lv5cCQO
mX7CRSrDpjaHtW+WGPDa0oQsGgIfpqUz2IOg9NKbZZ1LJ2uzYfdOcf3oyRgwZJ9B
I3sh1vP6OzjZVVCMmtMTM+sYZEsDoNwhZwpkpiwMmj8tYtOPgKCYKpqCiXrGU0x2
ML5FtDIwiwU+O3zYYdCBWqvCb2Db0iA9Aov2whEBz/V2jnmrN5RMA/90UOh1E2zG
br4wM1Wt3hNrtj5qSxZGlF/HEMYJVB8Z2SgMjYu4vQz09qRVVqpGdT/dNvLAHQWg
w4xNCj071kVZDQdfwnqeWSKYUau9Xskvi8xhTT+WX8a5CsbVrM9vGslnS5XNeZ6p
h2D3Q+TAYTvT756icTl0qsYVP7PrPY7DdmQYu0q+Lc3jdGI+jyxO2h9OFBRLZ3p6
zFX2N8wkvvCCzP2DwVnnhIi/GovpSh7ksHnb039F36Y/IhZPqV1bGqdNQVdanv6I
8fcIDM6ltRQ7dO2Br5f1tKUZE9Pm6x60b/uRVjhfVh65uTEKyGRhcm5j9ztzvQfI
cCBg4rbVRNKolxuDEkjsAFXVoiiEEsb7pLf4pMO+Dr62wxFG589tQNySySneUIVZ
J9ILnGAAeQ==
=aVWo
-----END PGP SIGNATURE-----
Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block
Pull arch task_work cleanups from Jens Axboe:
"Two cleanups that don't fit other categories:
- Finally get the task_work_add() cleanup done properly, so we don't
have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
all callers, and also fixes up the documentation for
task_work_add().
- While working on some TIF related changes for 5.11, this
TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
duplication for how that is handled"
* tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
task_work: cleanup notification modes
tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
- Improve performance of Spectre-v2 mitigation on Falkor CPUs (if you're lucky
enough to have one)
- Select HAVE_MOVE_PMD. This has been shown to improve mremap() performance,
which is used heavily by the Android runtime GC, and it seems we forgot to
enable this upstream back in 2018.
- Ensure linker flags are consistent between LLVM and BFD
- Fix stale comment in Spectre mitigation rework
- Fix broken copyright header
- Fix KASLR randomisation of the linear map
- Prevent arm64-specific prctl()s from compat tasks (return -EINVAL)
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+QEPAQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNE8jB/0YNYKO9mis/Xn5KcOCwlg4dbc2uVBknZXD
f7otEJ6SOax2HcWz8qJlrJ+qbGFawPIqFBUAM0vU1VmoyctIoKRFTA8ACfWfWtnK
QBfHrcxtJCh/GGq+E1IyuqWzCjppeY/7gYVdgi1xDEZRSaLz53MC1GVBwKBtu5cf
X2Bfm8d9+PSSnmKfpO65wSCTvN3PQX1SNEHwwTWFZQx0p7GcQK1DdwoobM6dRnVy
+e984ske+2a+nTrkhLSyQIgsfHuLB4pD6XdM/UOThnfdNxdQ0dUGn375sXP+b4dW
7MTH9HP/dXIymTcuErMXOHJXLk/zUiUBaOxkmOxdvrhQd0uFNFIc
=e9p9
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull more arm64 updates from Will Deacon:
"A small selection of further arm64 fixes and updates. Most of these
are fixes that came in during the merge window, with the exception of
the HAVE_MOVE_PMD mremap() speed-up which we discussed back in 2018
and somehow forgot to enable upstream.
- Improve performance of Spectre-v2 mitigation on Falkor CPUs (if
you're lucky enough to have one)
- Select HAVE_MOVE_PMD. This has been shown to improve mremap()
performance, which is used heavily by the Android runtime GC, and
it seems we forgot to enable this upstream back in 2018.
- Ensure linker flags are consistent between LLVM and BFD
- Fix stale comment in Spectre mitigation rework
- Fix broken copyright header
- Fix KASLR randomisation of the linear map
- Prevent arm64-specific prctl()s from compat tasks (return -EINVAL)"
Link: https://lore.kernel.org/kvmarm/20181108181201.88826-3-joelaf@google.com/
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: proton-pack: Update comment to reflect new function name
arm64: spectre-v2: Favour CPU-specific mitigation at EL2
arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
arm64: Fix a broken copyright header in gen_vdso_offsets.sh
arm64: mremap speedup - Enable HAVE_MOVE_PMD
arm64: mm: use single quantity to represent the PA to VA translation
arm64: reject prctl(PR_PAC_RESET_KEYS) on compat tasks
- Support 'make compile_commands.json' to generate the compilation
database more easily, avoiding stale entries
- Support 'make clang-analyzer' and 'make clang-tidy' for static checks
using clang-tidy
- Preprocess scripts/modules.lds.S to allow CONFIG options in the module
linker script
- Drop cc-option tests from compiler flags supported by our minimal
GCC/Clang versions
- Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
- Use sha1 build id for both BFD linker and LLD
- Improve deb-pkg for reproducible builds and rootless builds
- Remove stale, useless scripts/namespace.pl
- Turn -Wreturn-type warning into error
- Fix build error of deb-pkg when CONFIG_MODULES=n
- Replace 'hostname' command with more portable 'uname -n'
- Various Makefile cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl+RfS0VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGG1QP/2hzoMzK1YXErPUhGrhYU1rxz7Nu
HkLTIkyKF1HPwSJf5XyNW/FTBI4SDlkNoVg/weEDCS1yFxxpvQLIck8ChzA1kIIM
P+1IfBWOTzqn91XsapU2zwSno3gylphVchVIvYAB3oLUotGeMSluy1cQtBRzyA5D
rj2Q7H8fzkzk3YoBcBC/BOKDlfo/usqQ1X/gsfRFwN/BJxeZSYoujNBE7KtHaDsd
8K/ggBIqmST4NBn+M8c11d8CxzvWbtG1gq3EkUL5nG8T13DsGn1EFC0SPt85bkvv
f9YywfJi37HixhZzK6tXYjN/PWoiEY6z90mhd0NtZghQT7kQMiTQ3sWrM8dX3ssf
phBzO94uFQDjhyxOaSSsCoI/TIciAPo4+G8PNjcaEtj63IEfhEz/dnlstYwY5Y9P
Pp3aZtVjSGJwGW2u2EUYj6paFVqjf6DXQjQKPNHnsYCEidIvFTjjguRGvx9gl6mx
yd8oseOsAtOEf0alRe9MMdvN17O3UrRAxgBdap7fktg02TLVRGxZIbuwKmBf29ho
ORl9zeFkYBn6XQFyuItJoXy/kYFyHDaBEPYCRQcY4dwqcjZIiAc/FhYbqYthJ59L
5vLN2etmDIVSuUv1J5nBqHHGCqJChykbqg7riQ651dCNKw4gZB8ctCay2lXhBXMg
1mqOcoG5WWL7//F+
=tZRN
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Support 'make compile_commands.json' to generate the compilation
database more easily, avoiding stale entries
- Support 'make clang-analyzer' and 'make clang-tidy' for static checks
using clang-tidy
- Preprocess scripts/modules.lds.S to allow CONFIG options in the
module linker script
- Drop cc-option tests from compiler flags supported by our minimal
GCC/Clang versions
- Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
- Use sha1 build id for both BFD linker and LLD
- Improve deb-pkg for reproducible builds and rootless builds
- Remove stale, useless scripts/namespace.pl
- Turn -Wreturn-type warning into error
- Fix build error of deb-pkg when CONFIG_MODULES=n
- Replace 'hostname' command with more portable 'uname -n'
- Various Makefile cleanups
* tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
kbuild: Use uname for LINUX_COMPILE_HOST detection
kbuild: Only add -fno-var-tracking-assignments for old GCC versions
kbuild: remove leftover comment for filechk utility
treewide: remove DISABLE_LTO
kbuild: deb-pkg: clean up package name variables
kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
kbuild: enforce -Werror=return-type
scripts: remove namespace.pl
builddeb: Add support for all required debian/rules targets
builddeb: Enable rootless builds
builddeb: Pass -n to gzip for reproducible packages
kbuild: split the build log of kallsyms
kbuild: explicitly specify the build id style
scripts/setlocalversion: make git describe output more reliable
kbuild: remove cc-option test of -Werror=date-time
kbuild: remove cc-option test of -fno-stack-check
kbuild: remove cc-option test of -fno-strict-overflow
kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
kbuild: do not create built-in objects for external module builds
...
The function detect_harden_bp_fw() is gone after commit d4647f0a2a
("arm64: Rewrite Spectre-v2 mitigation code"). Update this comment to
reflect the new state of affairs.
Fixes: d4647f0a2a ("arm64: Rewrite Spectre-v2 mitigation code")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Cc: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201020214544.3206838-3-swboyd@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
This change removes all instances of DISABLE_LTO from
Makefiles, as they are currently unused, and the preferred
method of disabling LTO is to filter out the flags instead.
Note added by Masahiro Yamada:
DISABLE_LTO was added as preparation for GCC LTO, but GCC LTO was
not pulled into the mainline. (https://lkml.org/lkml/2014/4/8/272)
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Spectre-v2 can be mitigated on Falkor CPUs either by calling into
firmware or by issuing a magic, CPU-specific sequence of branches.
Although the latter is faster, the size of the code sequence means that
it cannot be used in the EL2 vectors, and so there is a need for both
mitigations to co-exist in order to achieve optimal performance.
Change the mitigation selection logic for Spectre-v2 so that the
CPU-specific mitigation is used only when the firmware mitigation is
also available, rather than when a firmware mitigation is unavailable.
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
I was going to copy this but I didn't want to chase around the build
system stuff so I did it a different way.
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Link: https://lore.kernel.org/r/20201017002637.503579-1-palmer@dabbelt.com
Signed-off-by: Will Deacon <will@kernel.org>
All the callers currently do this, clean it up and move the clearing
into tracehook_notify_resume() instead.
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It doesn't make sense to issue prctl(PR_PAC_RESET_KEYS) on a
compat task because the 32-bit instruction set does not offer PAuth
instructions. For consistency with other 64-bit only prctls such as
{SET,GET}_TAGGED_ADDR_CTRL, reject the prctl on compat tasks.
Although this is a userspace-visible change, maybe it isn't too late
to make this change given that the hardware isn't available yet and
it's very unlikely that anyone has 32-bit software that actually
depends on this succeeding.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Link: https://linux-review.googlesource.com/id/Ie885a1ff84ab498cc9f62d6451e9f2cfd4b1d06a
Link: https://lore.kernel.org/r/20201014052430.11630-1-pcc@google.com
[will: Do the same for the SVE prctl()s]
Signed-off-by: Will Deacon <will@kernel.org>
- Rework cpufreq statistics collection to allow it to take place
when fast frequency switching is enabled in the governor (Viresh
Kumar).
- Make the cpufreq core set the frequency scale on behalf of the
driver and update several cpufreq drivers accordingly (Ionela
Voinescu, Valentin Schneider).
- Add new hardware support to the STI and qcom cpufreq drivers and
improve them (Alain Volmat, Manivannan Sadhasivam).
- Fix multiple assorted issues in cpufreq drivers (Jon Hunter,
Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan
Gerhold, Viresh Kumar).
- Fix several assorted issues in the operating performance points
(OPP) framework (Stephan Gerhold, Viresh Kumar).
- Allow devfreq drivers to fetch devfreq instances by DT enumeration
instead of using explicit phandles and modify the devfreq core
code to support driver-specific devfreq DT bindings (Leonard
Crestez, Chanwoo Choi).
- Improve initial hardware resetting in the tegra30 devfreq driver
and clean up the tegra cpuidle driver (Dmitry Osipenko).
- Update the cpuidle core to collect state entry rejection
statistics and expose them via sysfs (Lina Iyer).
- Improve the ACPI _CST code handling diagnostics (Chen Yu).
- Update the PSCI cpuidle driver to allow the PM domain
initialization to occur in the OSI mode as well as in the PC
mode (Ulf Hansson).
- Rework the generic power domains (genpd) core code to allow
domain power off transition to be aborted in the absence of the
"power off" domain callback (Ulf Hansson).
- Fix two suspend-to-idle issues in the ACPI EC driver (Rafael
Wysocki).
- Fix the handling of timer_expires in the PM-runtime framework on
32-bit systems and the handling of device links in it (Grygorii
Strashko, Xiang Chen).
- Add IO requests batching support to the hibernate image saving and
reading code and drop a bogus get_gendisk() from there (Xiaoyi
Chen, Christoph Hellwig).
- Allow PCIe ports to be put into the D3cold power state if they
are power-manageable via ACPI (Lukas Wunner).
- Add missing header file include to a power capping driver (Pujin
Shi).
- Clean up the qcom-cpr AVS driver a bit (Liu Shixin).
- Kevin Hilman steps down as designated reviwer of adaptive voltage
scaling (AVS) driverrs (Kevin Hilman).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl+F4A4SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxX6QP/iELq9/OsH0aJdDQlY9tnh2Oa13+HB/Y
w1e6W+ZR/YjPgUpMVARwRLKf/gn7dUEwRDHVpGvDOyun+HACCPHB2hg8iktbxdVl
NFAVGZCCRezXqz3opL1hl8C3Dh0CqUPUjWXGMr+Lw2TZQKT+hx9K1dm9Epe3ivyT
RlVH/wifei80cFRcUUj7DI5KLCAyk+uKkZIFnZHAGKK6qOHMqRL5sDZsMUwWpd2i
AdghABjePbaiLTAoZuUsJINAGY4DnIt6ASRdMJ4iksiD6pFITwFs0HSOPe7hZLlv
zbwDPI5+TIkrOy9/aWoMaEIH1OQiFN/O++Slvdjn7gMsRgoW4d300ru4Jo1pOHxb
5twxagCCqlOf4YAaSrMCH4HT+c6fOWoGj2AKzX3DMJyO3/WN+8XNvUxKtC5Px1u+
pWRASjfQMO2j6nNjTCTwDJdYzggiKa54rYH2k7svX7XnTIAf+2E1gv8b4rMTgQrZ
0rq9kULYlhgk3EYjd/DndkvxunRlmiqhzrYB4jc9eDSPNzB8FZEbw1ZMRQTFfjK0
kp0vaEpTJ7JfKSCfluB4UmTuQoGogLl0xbzc+2NNIpwdNmrH2Srvq6wbj35jEDTU
tqsTsBP+XZFOWyFOw/L2J47LTOp0TJnz8z4aycLfrmdNUVnXJoU1sXgFlDzETMgT
0E6cTVwLF7Zi
=rGhy
-----END PGP SIGNATURE-----
Merge tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These rework the collection of cpufreq statistics to allow it to take
place if fast frequency switching is enabled in the governor, rework
the frequency invariance handling in the cpufreq core and drivers, add
new hardware support to a couple of cpufreq drivers, fix a number of
assorted issues and clean up the code all over.
Specifics:
- Rework cpufreq statistics collection to allow it to take place when
fast frequency switching is enabled in the governor (Viresh Kumar).
- Make the cpufreq core set the frequency scale on behalf of the
driver and update several cpufreq drivers accordingly (Ionela
Voinescu, Valentin Schneider).
- Add new hardware support to the STI and qcom cpufreq drivers and
improve them (Alain Volmat, Manivannan Sadhasivam).
- Fix multiple assorted issues in cpufreq drivers (Jon Hunter,
Krzysztof Kozlowski, Matthias Kaehlcke, Pali Rohár, Stephan
Gerhold, Viresh Kumar).
- Fix several assorted issues in the operating performance points
(OPP) framework (Stephan Gerhold, Viresh Kumar).
- Allow devfreq drivers to fetch devfreq instances by DT enumeration
instead of using explicit phandles and modify the devfreq core code
to support driver-specific devfreq DT bindings (Leonard Crestez,
Chanwoo Choi).
- Improve initial hardware resetting in the tegra30 devfreq driver
and clean up the tegra cpuidle driver (Dmitry Osipenko).
- Update the cpuidle core to collect state entry rejection statistics
and expose them via sysfs (Lina Iyer).
- Improve the ACPI _CST code handling diagnostics (Chen Yu).
- Update the PSCI cpuidle driver to allow the PM domain
initialization to occur in the OSI mode as well as in the PC mode
(Ulf Hansson).
- Rework the generic power domains (genpd) core code to allow domain
power off transition to be aborted in the absence of the "power
off" domain callback (Ulf Hansson).
- Fix two suspend-to-idle issues in the ACPI EC driver (Rafael
Wysocki).
- Fix the handling of timer_expires in the PM-runtime framework on
32-bit systems and the handling of device links in it (Grygorii
Strashko, Xiang Chen).
- Add IO requests batching support to the hibernate image saving and
reading code and drop a bogus get_gendisk() from there (Xiaoyi
Chen, Christoph Hellwig).
- Allow PCIe ports to be put into the D3cold power state if they are
power-manageable via ACPI (Lukas Wunner).
- Add missing header file include to a power capping driver (Pujin
Shi).
- Clean up the qcom-cpr AVS driver a bit (Liu Shixin).
- Kevin Hilman steps down as designated reviwer of adaptive voltage
scaling (AVS) drivers (Kevin Hilman)"
* tag 'pm-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits)
cpufreq: stats: Fix string format specifier mismatch
arm: disable frequency invariance for CONFIG_BL_SWITCHER
cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale()
cpufreq: stats: Add memory barrier to store_reset()
cpufreq: schedutil: Simplify sugov_fast_switch()
ACPI: EC: PM: Drop ec_no_wakeup check from acpi_ec_dispatch_gpe()
ACPI: EC: PM: Flush EC work unconditionally after wakeup
PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI
PM: hibernate: remove the bogus call to get_gendisk() in software_resume()
cpufreq: Move traces and update to policy->cur to cpufreq core
cpufreq: stats: Enable stats for fast-switch as well
cpufreq: stats: Mark few conditionals with unlikely()
cpufreq: stats: Remove locking
cpufreq: stats: Defer stats update to cpufreq_stats_record_transition()
PM: domains: Allow to abort power off when no ->power_off() callback
PM: domains: Rename power state enums for genpd
PM / devfreq: tegra30: Improve initial hardware resetting
PM / devfreq: event: Change prototype of devfreq_event_get_edev_by_phandle function
PM / devfreq: Change prototype of devfreq_get_devfreq_by_phandle function
PM / devfreq: Add devfreq_get_devfreq_by_node function
...
for_each_memblock() is used to iterate over memblock.memory in a few
places that use data from memblock_region rather than the memory ranges.
Introduce separate for_each_mem_region() and
for_each_reserved_mem_region() to improve encapsulation of memblock
internals from its users.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org> [x86]
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> [MIPS]
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-18-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Iteration over memblock.reserved with for_each_reserved_mem_region() used
__next_reserved_mem_region() that implemented a subset of
__next_mem_region().
Use __for_each_mem_range() and, essentially, __next_mem_region() with
appropriate parameters to reduce code duplication.
While on it, rename for_each_reserved_mem_region() to
for_each_reserved_mem_range() for consistency.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-17-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently for_each_mem_range() and for_each_mem_range_rev() iterators are
the most generic way to traverse memblock regions. As such, they have 8
parameters and they are hardly convenient to users. Most users choose to
utilize one of their wrappers and the only user that actually needs most
of the parameters is memblock itself.
To avoid yet another naming for memblock iterators, rename the existing
for_each_mem_range[_rev]() to __for_each_mem_range[_rev]() and add a new
for_each_mem_range[_rev]() wrappers with only index, start and end
parameters.
The new wrapper nicely fits into init_unavailable_mem() and will be used
in upcoming changes to simplify memblock traversals.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> [MIPS]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-11-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
because the heuristics that various linkers & compilers use to handle them
(include these bits into the output image vs discarding them silently)
are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook (et al)
adds build time asserts and build time warnings if there's any orphan section
in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix a metric
ton of dependencies and some outright bugs related to this, before we can
finally enable the checks on the x86, ARM and ARM64 platforms.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl+Edv4RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hiKBAApdJEOaK7hMc3013DYNctklIxEPJL2mFJ
11YJRIh4pUJTF0TE+EHT/D+rSIuRsyuoSmOQBQ61/wVSnyG067GjjVJRqh/eYaJ1
fDhJi2FuHOjXl+CiN0KxzBjjp+V4NhF7jHT59tpQSvfZeg7FjteoxfztxaCp5ek3
S3wHB3CC4c4jE3lfjHem1E9/PwT4kwPYx1c3gAUdEqJdjkihjX9fWusfjLeqW6/d
Y5VkApi6bL9XiZUZj5l0dEIweLJJ86+PkKJqpo3spxxEak1LSn1MEix+lcJ8e1Kg
sb/bEEivDcmFlFWOJnn0QLquCR0Cx5bz1pwsL0tuf0yAd4+sXX5IMuGUysZlEdKM
BHL9h5HbevGF4BScwZwZH7lyEg7q67s5KnRu4hxy0Swfcj7y0oT/9lXqpbpZ2DqO
Hd+bRRQKIbqnTMp0hcit9LfpLp93vj0dBlaV5ocAJJlu62u9VnwGG5HQuZ5giLUr
kA1SLw63Y1wopFRxgFyER8les7eLsu0zxHeK44rRVlVnfI99OMTOgVNicmDFy3Fm
AfcnfJG0BqBEJGQz5es34uQQKKBwFPtC9NztopI62KiwOspYYZyrO1BNxdOc6DlS
mIHrmO89HMXuid5eolvLaFqUWirHoWO8TlycgZxUWVHc2txVPjAEU/axouU/dSSU
w/6GpzAa+7g=
=fXAw
-----END PGP SIGNATURE-----
Merge tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull orphan section checking from Ingo Molnar:
"Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle
them (include these bits into the output image vs discarding them
silently) are both highly idiosyncratic and also version dependent.
Instead of this historically problematic mess, this tree by Kees Cook
(et al) adds build time asserts and build time warnings if there's any
orphan section in the kernel or if a section is not sized as expected.
And because we relied on so many silent assumptions in this area, fix
a metric ton of dependencies and some outright bugs related to this,
before we can finally enable the checks on the x86, ARM and ARM64
platforms"
* tag 'core-build-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/boot/compressed: Warn on orphan section placement
x86/build: Warn on orphan section placement
arm/boot: Warn on orphan section placement
arm/build: Warn on orphan section placement
arm64/build: Warn on orphan section placement
x86/boot/compressed: Add missing debugging sections to output
x86/boot/compressed: Remove, discard, or assert for unwanted sections
x86/boot/compressed: Reorganize zero-size section asserts
x86/build: Add asserts for unwanted sections
x86/build: Enforce an empty .got.plt section
x86/asm: Avoid generating unused kprobe sections
arm/boot: Handle all sections explicitly
arm/build: Assert for unwanted sections
arm/build: Add missing sections
arm/build: Explicitly keep .ARM.attributes sections
arm/build: Refactor linker script headers
arm64/build: Assert for unwanted sections
arm64/build: Add missing DWARF sections
arm64/build: Use common DISCARDS in linker script
arm64/build: Remove .eh_frame* sections due to unwind tables
...
Core:
- Allow trimming of interrupt hierarchy to support odd hardware setups
where only a subset of the interrupts requires the full hierarchy.
- Allow the retrigger mechanism to follow a hierarchy to simplify
driver code.
- Provide a mechanism to force enable wakeup interrrupts on suspend.
- More infrastructure to handle IPIs in the core code
Architectures:
- Convert ARM/ARM64 IPI handling to utilize the interrupt core code.
Drivers:
- The usual pile of new interrupt chips (MStar, Actions Owl, TI PRUSS,
Designware ICTL)
- ARM(64) IPI related conversions
- Wakeup support for Qualcom PDC
- Prevent hierarchy corruption in the NVIDIA Tegra driver
- The usual small fixes, improvements and cleanups all over the place.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl+ENDsTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoTyXD/9oGq37/zjpCggtRWdTKGtKvndjodqt
82zTZ1eSukDSE3UoT7PL8cRQ/4MnRZ7Ke+Iidd2uUbWADfJN28+4d26wN/aYYlX7
HmI/zowBgK6CJweynHYEF9/C8g2v2SRg5HJCJSOSuVLnTKNLc/aHX5rc/FZXGd6v
K1BOHJFlzoU1w+OnFfoH4TeJdoKhzXi/T5zJFFtadOVIeCONxTEs4Fxkej2cuBsu
Nz38WfkPdOnyrVIPhA10KgigczcRkKXU0ot/bNH4s9j2ZIGdgtq3UIbH+itleW2S
bSWSShnlhSMS918pZNcR49iRyP2CsM+JxcHAmcbA6VPBpKbk2Pb5Zta8g08TZm+X
XxaDwPFoR4BG00B0L4uygEuHcE89mDy0gCFog0zG7sU+LuY4FYQSSMUqwIC4i/HJ
DJdWrVqnNHJFCS6wvBl9NO0lyuUrn2be2/IzUtZ3d0xbA0uJXfvI4WgFrbunoPEU
zgHblQN5nkDLWujjzC10C9vmTi1xxP6FiYcrMScZZ5US0JlHaptkoPOhs82KYQvV
0DPk06XGWnJMc27+MQYVIMDhQggi3It9pgDRhoyz9Xpgn9fmhhp0goL7KnFk9Hbr
BKFdW4VBbU0PZacoI6Q186lTQZRptTKfREL+bHvUL2Xyb0RO6nerBPzE5Wxwb2vW
PmHgFezXDVHbIQ==
=1ewL
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Allow trimming of interrupt hierarchy to support odd hardware
setups where only a subset of the interrupts requires the full
hierarchy.
- Allow the retrigger mechanism to follow a hierarchy to simplify
driver code.
- Provide a mechanism to force enable wakeup interrrupts on suspend.
- More infrastructure to handle IPIs in the core code
Architectures:
- Convert ARM/ARM64 IPI handling to utilize the interrupt core code.
Drivers:
- The usual pile of new interrupt chips (MStar, Actions Owl, TI
PRUSS, Designware ICTL)
- ARM(64) IPI related conversions
- Wakeup support for Qualcom PDC
- Prevent hierarchy corruption in the NVIDIA Tegra driver
- The usual small fixes, improvements and cleanups all over the
place"
* tag 'irq-core-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
dt-bindings: interrupt-controller: Add MStar interrupt controller
irqchip/irq-mst: Add MStar interrupt controller support
soc/tegra: pmc: Don't create fake interrupt hierarchy levels
soc/tegra: pmc: Allow optional irq parent callbacks
gpio: tegra186: Allow optional irq parent callbacks
genirq/irqdomain: Allow partial trimming of irq_data hierarchy
irqchip/qcom-pdc: Reset PDC interrupts during init
irqchip/qcom-pdc: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
pinctrl: qcom: Set IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
genirq/PM: Introduce IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND flag
pinctrl: qcom: Use return value from irq_set_wake() call
pinctrl: qcom: Set IRQCHIP_SET_TYPE_MASKED and IRQCHIP_MASK_ON_SUSPEND flags
ARM: Handle no IPI being registered in show_ipi_list()
MAINTAINERS: Add entries for Actions Semi Owl SIRQ controller
irqchip: Add Actions Semi Owl SIRQ controller
dt-bindings: interrupt-controller: Add Actions SIRQ controller binding
dt-bindings: dw-apb-ictl: Update binding to describe use as primary interrupt controller
irqchip/dw-apb-ictl: Add primary interrupt controller support
irqchip/dw-apb-ictl: Refactor priot to introducing hierarchical irq domains
genirq: Add stub for set_handle_irq() when !GENERIC_IRQ_MULTI_HANDLER
...
- Userspace support for the Memory Tagging Extension introduced by Armv8.5.
Kernel support (via KASAN) is likely to follow in 5.11.
- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.
- Fix and subsequent rewrite of our Spectre mitigations, including the
addition of support for PR_SPEC_DISABLE_NOEXEC.
- Support for the Armv8.3 Pointer Authentication enhancements.
- Support for ASID pinning, which is required when sharing page-tables with
the SMMU.
- MM updates, including treating flush_tlb_fix_spurious_fault() as a no-op.
- Perf/PMU driver updates, including addition of the ARM CMN PMU driver and
also support to handle CPU PMU IRQs as NMIs.
- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.
- Implementation of ARCH_STACKWALK for unwinding.
- Improve reporting of unexpected kernel traps due to BPF JIT failure.
- Improve robustness of user-visible HWCAP strings and their corresponding
numerical constants.
- Removal of TEXT_OFFSET.
- Removal of some unused functions, parameters and prototypes.
- Removal of MPIDR-based topology detection in favour of firmware
description.
- Cleanups to handling of SVE and FPSIMD register state in preparation
for potential future optimisation of handling across syscalls.
- Cleanups to the SDEI driver in preparation for support in KVM.
- Miscellaneous cleanups and refactoring work.
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl+AUXMQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNFc1B/4q2Kabe+pPu7s1f58Q+OTaEfqcr3F1qh27
F1YpFZUYxg0GPfPsFrnbJpo5WKo7wdR9ceI9yF/GHjs7A/MSoQJis3pG6SlAd9c0
nMU5tCwhg9wfq6asJtl0/IPWem6cqqhdzC6m808DjeHuyi2CCJTt0vFWH3OeHEhG
cfmLfaSNXOXa/MjEkT8y1AXJ/8IpIpzkJeCRA1G5s18PXV9Kl5bafIo9iqyfKPLP
0rJljBmoWbzuCSMc81HmGUQI4+8KRp6HHhyZC/k0WEVgj3LiumT7am02bdjZlTnK
BeNDKQsv2Jk8pXP2SlrI3hIUTz0bM6I567FzJEokepvTUzZ+CVBi
=9J8H
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"There's quite a lot of code here, but much of it is due to the
addition of a new PMU driver as well as some arm64-specific selftests
which is an area where we've traditionally been lagging a bit.
In terms of exciting features, this includes support for the Memory
Tagging Extension which narrowly missed 5.9, hopefully allowing
userspace to run with use-after-free detection in production on CPUs
that support it. Work is ongoing to integrate the feature with KASAN
for 5.11.
Another change that I'm excited about (assuming they get the hardware
right) is preparing the ASID allocator for sharing the CPU page-table
with the SMMU. Those changes will also come in via Joerg with the
IOMMU pull.
We do stray outside of our usual directories in a few places, mostly
due to core changes required by MTE. Although much of this has been
Acked, there were a couple of places where we unfortunately didn't get
any review feedback.
Other than that, we ran into a handful of minor conflicts in -next,
but nothing that should post any issues.
Summary:
- Userspace support for the Memory Tagging Extension introduced by
Armv8.5. Kernel support (via KASAN) is likely to follow in 5.11.
- Selftests for MTE, Pointer Authentication and FPSIMD/SVE context
switching.
- Fix and subsequent rewrite of our Spectre mitigations, including
the addition of support for PR_SPEC_DISABLE_NOEXEC.
- Support for the Armv8.3 Pointer Authentication enhancements.
- Support for ASID pinning, which is required when sharing
page-tables with the SMMU.
- MM updates, including treating flush_tlb_fix_spurious_fault() as a
no-op.
- Perf/PMU driver updates, including addition of the ARM CMN PMU
driver and also support to handle CPU PMU IRQs as NMIs.
- Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.
- Implementation of ARCH_STACKWALK for unwinding.
- Improve reporting of unexpected kernel traps due to BPF JIT
failure.
- Improve robustness of user-visible HWCAP strings and their
corresponding numerical constants.
- Removal of TEXT_OFFSET.
- Removal of some unused functions, parameters and prototypes.
- Removal of MPIDR-based topology detection in favour of firmware
description.
- Cleanups to handling of SVE and FPSIMD register state in
preparation for potential future optimisation of handling across
syscalls.
- Cleanups to the SDEI driver in preparation for support in KVM.
- Miscellaneous cleanups and refactoring work"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (148 commits)
Revert "arm64: initialize per-cpu offsets earlier"
arm64: random: Remove no longer needed prototypes
arm64: initialize per-cpu offsets earlier
kselftest/arm64: Check mte tagged user address in kernel
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Add utilities and a test to validate mte memory
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
...
ld's --build-id defaults to "sha1" style, while lld defaults to "fast".
The build IDs are very different between the two, which may confuse
programs that reference them.
Signed-off-by: Bill Wendling <morbo@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
This reverts commit 353e228eb3.
Qian Cai reports that TX2 no longer boots with his .config as it appears
that task_cpu() gets instrumented and used before KASAN has been
initialised.
Although Mark has a proposed fix, let's take the safe option of reverting
this for now and sorting it out properly later.
Link: https://lore.kernel.org/r/711bc57a314d8d646b41307008db2845b7537b3d.camel@redhat.com
Reported-by: Qian Cai <cai@redhat.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Late patches for 5.10: MTE selftests, minor KCSAN preparation and removal
of some unused prototypes.
(Amit Daniel Kachhap and others)
* for-next/late-arrivals:
arm64: random: Remove no longer needed prototypes
arm64: initialize per-cpu offsets earlier
kselftest/arm64: Check mte tagged user address in kernel
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Add utilities and a test to validate mte memory
The current initialization of the per-cpu offset register is difficult
to follow and this initialization is not always early enough for
upcoming instrumentation with KCSAN, where the instrumentation callbacks
use the per-cpu offset.
To make it possible to support KCSAN, and to simplify reasoning about
early bringup code, let's initialize the per-cpu offset earlier, before
we run any C code that may consume it. To do so, this patch adds a new
init_this_cpu_offset() helper that's called before the usual
primary/secondary start functions. For consistency, this is also used to
re-initialize the per-cpu offset after the runtime per-cpu areas have
been allocated (which can change CPU0's offset).
So that init_this_cpu_offset() isn't subject to any instrumentation that
might consume the per-cpu offset, it is marked with noinstr, preventing
instrumentation.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20201005164303.21389-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Add userspace support for the Memory Tagging Extension introduced by
Armv8.5.
(Catalin Marinas and others)
* for-next/mte: (30 commits)
arm64: mte: Fix typo in memory tagging ABI documentation
arm64: mte: Add Memory Tagging Extension documentation
arm64: mte: Kconfig entry
arm64: mte: Save tags when hibernating
arm64: mte: Enable swap of tagged pages
mm: Add arch hooks for saving/restoring tags
fs: Handle intra-page faults in copy_mount_options()
arm64: mte: ptrace: Add NT_ARM_TAGGED_ADDR_CTRL regset
arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
arm64: mte: Allow {set,get}_tagged_addr_ctrl() on non-current tasks
arm64: mte: Restore the GCR_EL1 register after a suspend
arm64: mte: Allow user control of the generated random tags via prctl()
arm64: mte: Allow user control of the tag check mode via prctl()
mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
mm: Introduce arch_validate_flags()
arm64: mte: Add PROT_MTE support to mmap() and mprotect()
mm: Introduce arch_calc_vm_flag_bits()
arm64: mte: Tags-aware aware memcmp_pages() implementation
arm64: Avoid unnecessary clear_user_page() indirection
...
Fix and subsequently rewrite Spectre mitigations, including the addition
of support for PR_SPEC_DISABLE_NOEXEC.
(Will Deacon and Marc Zyngier)
* for-next/ghostbusters: (22 commits)
arm64: Add support for PR_SPEC_DISABLE_NOEXEC prctl() option
arm64: Pull in task_stack_page() to Spectre-v4 mitigation code
KVM: arm64: Allow patching EL2 vectors even with KASLR is not enabled
arm64: Get rid of arm64_ssbd_state
KVM: arm64: Convert ARCH_WORKAROUND_2 to arm64_get_spectre_v4_state()
KVM: arm64: Get rid of kvm_arm_have_ssbd()
KVM: arm64: Simplify handling of ARCH_WORKAROUND_2
arm64: Rewrite Spectre-v4 mitigation code
arm64: Move SSBD prctl() handler alongside other spectre mitigation code
arm64: Rename ARM64_SSBD to ARM64_SPECTRE_V4
arm64: Treat SSBS as a non-strict system feature
arm64: Group start_thread() functions together
KVM: arm64: Set CSV2 for guests on hardware unaffected by Spectre-v2
arm64: Rewrite Spectre-v2 mitigation code
arm64: Introduce separate file for spectre mitigations and reporting
arm64: Rename ARM64_HARDEN_BRANCH_PREDICTOR to ARM64_SPECTRE_V2
KVM: arm64: Simplify install_bp_hardening_cb()
KVM: arm64: Replace CONFIG_KVM_INDIRECT_VECTORS with CONFIG_RANDOMIZE_BASE
arm64: Remove Spectre-related CONFIG_* options
arm64: Run ARCH_WORKAROUND_2 enabling code on all CPUs
...
Remove unused functions and parameters from ACPI IORT code.
(Zenghui Yu via Lorenzo Pieralisi)
* for-next/acpi:
ACPI/IORT: Remove the unused inline functions
ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
Remove redundant code and fix documentation of caching behaviour for the
HVC_SOFT_RESTART hypercall.
(Pingfan Liu)
* for-next/boot:
Documentation/kvm/arm: improve description of HVC_SOFT_RESTART
arm64/relocate_kernel: remove redundant code
Improve reporting of unexpected kernel traps due to BPF JIT failure.
(Will Deacon)
* for-next/bpf:
arm64: Improve diagnostics when trapping BRK with FAULT_BRK_IMM
Improve robustness of user-visible HWCAP strings and their corresponding
numerical constants.
(Anshuman Khandual)
* for-next/cpuinfo:
arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions
Cleanups to handling of SVE and FPSIMD register state in preparation
for potential future optimisation of handling across syscalls.
(Julien Grall)
* for-next/fpsimd:
arm64/sve: Implement a helper to load SVE registers from FPSIMD state
arm64/sve: Implement a helper to flush SVE registers
arm64/fpsimdmacros: Allow the macro "for" to be used in more cases
arm64/fpsimdmacros: Introduce a macro to update ZCR_EL1.LEN
arm64/signal: Update the comment in preserve_sve_context
arm64/fpsimd: Update documentation of do_sve_acc
Miscellaneous changes.
(Tian Tao and others)
* for-next/misc:
arm64/mm: return cpu_all_mask when node is NUMA_NO_NODE
arm64: mm: Fix missing-prototypes in pageattr.c
arm64/fpsimd: Fix missing-prototypes in fpsimd.c
arm64: hibernate: Remove unused including <linux/version.h>
arm64/mm: Refactor {pgd, pud, pmd, pte}_ERROR()
arm64: Remove the unused include statements
arm64: get rid of TEXT_OFFSET
arm64: traps: Add str of description to panic() in die()
Memory management updates and cleanups.
(Anshuman Khandual and others)
* for-next/mm:
arm64: dbm: Invalidate local TLB when setting TCR_EL1.HD
arm64: mm: Make flush_tlb_fix_spurious_fault() a no-op
arm64/mm: Unify CONT_PMD_SHIFT
arm64/mm: Unify CONT_PTE_SHIFT
arm64/mm: Remove CONT_RANGE_OFFSET
arm64/mm: Enable THP migration
arm64/mm: Change THP helpers to comply with generic MM semantics
arm64/mm/ptdump: Add address markers for BPF regions
Allow prefetchable PCI BARs to be exposed to userspace using normal
non-cacheable mappings.
(Clint Sbisa)
* for-next/pci:
arm64: Enable PCI write-combine resources under sysfs
Perf/PMU driver updates.
(Julien Thierry and others)
* for-next/perf:
perf: arm-cmn: Fix conversion specifiers for node type
perf: arm-cmn: Fix unsigned comparison to less than zero
arm_pmu: arm64: Use NMIs for PMU
arm_pmu: Introduce pmu_irq_ops
KVM: arm64: pmu: Make overflow handler NMI safe
arm64: perf: Defer irq_work to IPI_IRQ_WORK
arm64: perf: Remove PMU locking
arm64: perf: Avoid PMXEV* indirection
arm64: perf: Add missing ISB in armv8pmu_enable_counter()
perf: Add Arm CMN-600 PMU driver
perf: Add Arm CMN-600 DT binding
arm64: perf: Add support caps under sysfs
drivers/perf: thunderx2_pmu: Fix memory resource error handling
drivers/perf: xgene_pmu: Fix uninitialized resource struct
perf: arm_dsu: Support DSU ACPI devices
arm64: perf: Remove unnecessary event_idx check
drivers/perf: hisi: Add missing include of linux/module.h
arm64: perf: Add general hardware LLC events for PMUv3
Support for the Armv8.3 Pointer Authentication enhancements.
(By Amit Daniel Kachhap)
* for-next/ptrauth:
arm64: kprobe: clarify the comment of steppable hint instructions
arm64: kprobe: disable probe of fault prone ptrauth instruction
arm64: cpufeature: Modify address authentication cpufeature to exact
arm64: ptrauth: Introduce Armv8.3 pointer authentication enhancements
arm64: traps: Allow force_signal_inject to pass esr error code
arm64: kprobe: add checks for ARMv8.3-PAuth combined instructions
Tonnes of cleanup to the SDEI driver.
(Gavin Shan)
* for-next/sdei:
firmware: arm_sdei: Remove _sdei_event_unregister()
firmware: arm_sdei: Remove _sdei_event_register()
firmware: arm_sdei: Introduce sdei_do_local_call()
firmware: arm_sdei: Cleanup on cross call function
firmware: arm_sdei: Remove while loop in sdei_event_unregister()
firmware: arm_sdei: Remove while loop in sdei_event_register()
firmware: arm_sdei: Remove redundant error message in sdei_probe()
firmware: arm_sdei: Remove duplicate check in sdei_get_conduit()
firmware: arm_sdei: Unregister driver on error in sdei_init()
firmware: arm_sdei: Avoid nested statements in sdei_init()
firmware: arm_sdei: Retrieve event number from event instance
firmware: arm_sdei: Common block for failing path in sdei_event_create()
firmware: arm_sdei: Remove sdei_is_err()
Selftests for Pointer Authentication and FPSIMD/SVE context-switching.
(Mark Brown and Boyan Karatotev)
* for-next/selftests:
selftests: arm64: Add build and documentation for FP tests
selftests: arm64: Add wrapper scripts for stress tests
selftests: arm64: Add utility to set SVE vector lengths
selftests: arm64: Add stress tests for FPSMID and SVE context switching
selftests: arm64: Add test for the SVE ptrace interface
selftests: arm64: Test case for enumeration of SVE vector lengths
kselftests/arm64: add PAuth tests for single threaded consistency and differently initialized keys
kselftests/arm64: add PAuth test for whether exec() changes keys
kselftests/arm64: add nop checks for PAuth tests
kselftests/arm64: add a basic Pointer Authentication test
Implementation of ARCH_STACKWALK for unwinding.
(Mark Brown)
* for-next/stacktrace:
arm64: Move console stack display code to stacktrace.c
arm64: stacktrace: Convert to ARCH_STACKWALK
arm64: stacktrace: Make stack walk callback consistent with generic code
stacktrace: Remove reliable argument from arch_stack_walk() callback
Support for ASID pinning, which is required when sharing page-tables with
the SMMU.
(Jean-Philippe Brucker)
* for-next/svm:
arm64: cpufeature: Export symbol read_sanitised_ftr_reg()
arm64: mm: Pin down ASIDs for sharing mm with devices
Rely on firmware tables for establishing CPU topology.
(Valentin Schneider)
* for-next/topology:
arm64: topology: Stop using MPIDR for topology information
Spelling fixes.
(Xiaoming Ni and Yanfei Xu)
* for-next/tpyos:
arm64/numa: Fix a typo in comment of arm64_numa_init
arm64: fix some spelling mistakes in the comments by codespell
vDSO cleanups.
(Will Deacon)
* for-next/vdso:
arm64: vdso: Fix unusual formatting in *setup_additional_pages()
arm64: vdso32: Remove a bunch of #ifdef CONFIG_COMPAT_VDSO guards
TCR_EL1.HD is permitted to be cached in a TLB, so invalidate the local
TLB after setting the bit when detected support for the feature. Although
this isn't strictly necessary, since we can happily operate with the bit
effectively clear, the current code uses an ISB in a half-hearted attempt
to make the change effective, so let's just fix that up.
Link: https://lore.kernel.org/r/20201001110405.18617-1-will@kernel.org
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
Jonathan reports that the strict policy for memory mapped by the
ACPI core breaks the use case of passing ACPI table overrides via
initramfs. This is due to the fact that the memory type used for
loading the initramfs in memory is not recognized as a memory type
that is typically used by firmware to pass firmware tables.
Since the purpose of the strict policy is to ensure that no AML or
other ACPI code can manipulate any memory that is used by the kernel
to keep its internal state or the state of user tasks, we can relax
the permission check, and allow mappings of memory that is reserved
and marked as NOMAP via memblock, and therefore not covered by the
linear mapping to begin with.
Fixes: 1583052d11 ("arm64/acpi: disallow AML memory opregions to access kernel memory")
Fixes: 325f5585ec ("arm64/acpi: disallow writeable AML opregion mapping for EFI code regions")
Reported-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Link: https://lore.kernel.org/r/20200929132522.18067-1-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Add hyp percpu section to linker script and rename the corresponding ELF
sections of hyp/nvhe object files. This moves all nVHE-specific percpu
variables to the new hyp percpu section.
Allocate sufficient amount of memory for all percpu hyp regions at global KVM
init time and create corresponding hyp mappings.
The base addresses of hyp percpu regions are kept in a dynamically allocated
array in the kernel.
Add NULL checks in PMU event-reset code as it may run before KVM memory is
initialized.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-10-dbrazdil@google.com
Host CPU context is stored in a global per-cpu variable `kvm_host_data`.
In preparation for introducing independent per-CPU region for nVHE hyp,
create two separate instances of `kvm_host_data`, one for VHE and one
for nVHE.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-9-dbrazdil@google.com
Hyp keeps track of which cores require SSBD callback by accessing a
kernel-proper global variable. Create an nVHE symbol of the same name
and copy the value from kernel proper to nVHE as KVM is being enabled
on a core.
Done in preparation for separating percpu memory owned by kernel
proper and nVHE.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-8-dbrazdil@google.com
Minor cleanup that only creates __kvm_ex_table ELF section and
related symbols if CONFIG_KVM is enabled. Also useful as more
hyp-specific sections will be added.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-4-dbrazdil@google.com
Minor cleanup to move all macros related to prefixing nVHE hyp section
and symbol names into one place: hyp_image.h.
Signed-off-by: David Brazdil <dbrazdil@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200922204910.7265-3-dbrazdil@google.com
The PR_SPEC_DISABLE_NOEXEC option to the PR_SPEC_STORE_BYPASS prctl()
allows the SSB mitigation to be enabled only until the next execve(),
at which point the state will revert back to PR_SPEC_ENABLE and the
mitigation will be disabled.
Add support for PR_SPEC_DISABLE_NOEXEC on arm64.
Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
The kbuild robot reports that we're relying on an implicit inclusion to
get a definition of task_stack_page() in the Spectre-v4 mitigation code,
which is not always in place for some configurations:
| arch/arm64/kernel/proton-pack.c:329:2: error: implicit declaration of function 'task_stack_page' [-Werror,-Wimplicit-function-declaration]
| task_pt_regs(task)->pstate |= val;
| ^
| arch/arm64/include/asm/processor.h:268:36: note: expanded from macro 'task_pt_regs'
| ((struct pt_regs *)(THREAD_SIZE + task_stack_page(p)) - 1)
| ^
| arch/arm64/kernel/proton-pack.c:329:2: note: did you mean 'task_spread_page'?
Add the missing include to fix the build error.
Fixes: a44acf477220 ("arm64: Move SSBD prctl() handler alongside other spectre mitigation code")
Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/202009260013.Ul7AD29w%lkp@intel.com
Signed-off-by: Will Deacon <will@kernel.org>