Commit Graph

3628 Commits

Author SHA1 Message Date
Guilherme G. Piccoli
31e833b203 arm64: Unhash early pointer print plus improve comment
When facing a really early issue on DT parsing we have currently
a message that shows both the physical and virtual address of the FDT.
The printk pointer modifier for the virtual address shows a hashed
address there unless the user provides "no_hash_pointers" parameter in
the command-line. The situation in which this message shows-up is a bit
more serious though: the boot process is broken, nothing can be done
(even an oops is too much for this early stage) so we have this message
as a last resort in order to help debug bootloader issues, for example.
Hence, we hereby change that to "%px" in order to make debugging easy,
there's not much information leak risk in such early boot failure.

Also, we tried to improve a bit the commenting on that function, given
that if kernel fails there, it just hangs forever in a cpu_relax() loop.
The reason we cannot BUG/panic is that is too early to do so; thanks to
Mark Brown for pointing that on IRC and thanks Robin Murphy for the good
pointer hash discussion in the mailing-list.

Cc: Mark Brown <broonie@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20211221155230.1532850-1-gpiccoli@igalia.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-22 11:06:11 +00:00
Jakub Kicinski
7cd2802d74 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-16 16:13:19 -08:00
Lakshmi Ramasubramanian
9c5d89bc10 arm64: kexec: Fix missing error code 'ret' warning in load_other_segments()
Since commit ac10be5cdb ("arm64: Use common
of_kexec_alloc_and_setup_fdt()"), smatch reports the following warning:

  arch/arm64/kernel/machine_kexec_file.c:152 load_other_segments()
  warn: missing error code 'ret'

Return code is not set to an error code in load_other_segments() when
of_kexec_alloc_and_setup_fdt() call returns a NULL dtb. This results
in status success (return code set to 0) being returned from
load_other_segments().

Set return code to -EINVAL if of_kexec_alloc_and_setup_fdt() returns
NULL dtb.

Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: ac10be5cdb ("arm64: Use common of_kexec_alloc_and_setup_fdt()")
Link: https://lore.kernel.org/r/20211210010121.101823-1-nramas@linux.microsoft.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-16 17:44:58 +00:00
Kefeng Wang
dd03762ab6 arm64: Enable KCSAN
This patch enables KCSAN for arm64, with updates to build rules
to not use KCSAN for several incompatible compilation units.

Recent GCC version(at least GCC10) made outline-atomics as the
default option(unlike Clang), which will cause linker errors
for kernel/kcsan/core.o. Disables the out-of-line atomics by
no-outline-atomics to fix the linker errors.

Meanwhile, as Mark said[1], some latent issues are needed to be
fixed which isn't just a KCSAN problem, we make the KCSAN depends
on EXPERT for now.

Tested selftest and kcsan_test(built with GCC11 and Clang 13),
and all passed.

[1] https://lkml.kernel.org/r/YadiUPpJ0gADbiHQ@FVFF77S0Q05N

Acked-by: Marco Elver <elver@google.com> # kernel/kcsan
Tested-by: Joey Gouly <joey.gouly@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20211211131734.126874-1-wangkefeng.wang@huawei.com
[catalin.marinas@arm.com: added comment to justify EXPERT]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:54:34 +00:00
Mark Brown
12b792e5e2 arm64/fp: Add comments documenting the usage of state restore functions
Add comments to help people figure out when fpsimd_bind_state_to_cpu() and
fpsimd_update_current_state() are used.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211207163250.1373542-1-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:36:19 +00:00
Mark Brown
30c43e73b3 arm64/sve: Generalise vector length configuration prctl() for SME
In preparation for adding SME support update the bulk of the implementation
for the vector length configuration prctl() calls to be independent of
vector type.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-3-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:33:44 +00:00
Mark Brown
97bcbee404 arm64/sve: Make sysctl interface for SVE reusable by SME
The vector length configuration for SME is very similar to that for SVE
so in order to allow reuse refactor the SVE configuration so that it takes
the vector type from the struct ctl_table. Since there's no dedicated space
for this we repurpose the extra1 field to store the vector type, this is
otherwise unused for integer sysctls.

Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211210184133.320748-2-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:33:44 +00:00
Will Deacon
1609c22a8a Merge branch 'for-next/perf-cpu' into for-next/perf
* for-next/perf-cpu:
  arm64: perf: Support new DT compatibles
  arm64: perf: Simplify registration boilerplate
  arm64: perf: Support Denver and Carmel PMUs
2021-12-14 18:13:25 +00:00
Mark Brown
742a15b1a2 arm64: Use BTI C directly and unconditionally
Now we have a macro for BTI C that looks like a regular instruction change
all the users of the current BTI_C macro to just emit a BTI C directly and
remove the macro.

This does mean that we now unconditionally BTI annotate all assembly
functions, meaning that they are worse in this respect than code generated
by the compiler. The overhead should be minimal for implementations with a
reasonable HINT implementation.

Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211214152714.2380849-4-broonie@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-14 18:12:58 +00:00
Robin Murphy
893c34b60a arm64: perf: Support new DT compatibles
Wire up the new DT compatibles so we can present appropriate
PMU names to userspace for the latest and greatest CPUs.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/62d14ba12d847ec7f1fba7cb0b3b881b437e1cc5.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 18:05:11 +00:00
Robin Murphy
6ac9f30bd4 arm64: perf: Simplify registration boilerplate
With the trend for per-core events moving to userspace JSON, registering
names for PMUv3 implementations is increasingly a pure boilerplate
exercise. Let's wrap things a step further so we can generate the basic
PMUv3 init function with a macro invocation, and reduce further new
addition to just 2 lines each.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/b79477ea3b97f685d00511d4ecd2f686184dca34.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 18:05:11 +00:00
Thierry Reding
d4c4844a9b arm64: perf: Support Denver and Carmel PMUs
Add support for the NVIDIA Denver and Carmel PMUs using the generic
PMUv3 event map for now.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
[ rm: reorder entries alphabetically ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/5f0f69d47acca78a9e479501aa4d8b429e23cf11.1639490264.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 18:05:11 +00:00
Rob Herring
83a7a4d643 arm64: perf: Enable PMU counter userspace access for perf event
Arm PMUs can support direct userspace access of counters which allows for
low overhead (i.e. no syscall) self-monitoring of tasks. The same feature
exists on x86 called 'rdpmc'. Unlike x86, userspace access will only be
enabled for thread bound events. This could be extended if needed, but
simplifies the implementation and reduces the chances for any
information leaks (which the x86 implementation suffers from).

PMU EL0 access will be enabled when an event with userspace access is
part of the thread's context. This includes when the event is not
scheduled on the PMU. There's some additional overhead clearing
dirty counters when access is enabled in order to prevent leaking
disabled counter data from other tasks.

Unlike x86, enabling of userspace access must be requested with a new
attr bit: config1:1. If the user requests userspace access with 64-bit
counters, then the event open will fail if the h/w doesn't support
64-bit counters. Chaining is not supported with userspace access. The
modes for config1 are as follows:

config1 = 0 : user access disabled and always 32-bit
config1 = 1 : user access disabled and always 64-bit (using chaining if needed)
config1 = 2 : user access enabled and always 32-bit
config1 = 3 : user access enabled and always 64-bit

Based on work by Raphael Gault <raphael.gault@arm.com>, but has been
completely re-written.

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-5-robh@kernel.org
[will: Made armv8pmu_proc_user_access_handler() static]
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 11:40:59 +00:00
Rob Herring
e201260081 arm64: perf: Add userspace counter access disable switch
Like x86, some users may want to disable userspace PMU counter
altogether. Add a sysctl 'perf_user_access' file to control userspace
counter access. The default is '0' which is disabled. Writing '1'
enables access.

Note that x86 supports globally enabling user access by writing '2' to
/sys/bus/event_source/devices/cpu/rdpmc. As there's not existing
userspace support to worry about, this shouldn't be necessary for Arm.
It could be added later if the need arises.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-perf-users@vger.kernel.org
Acked-by: Will Deacon <will@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20211208201124.310740-4-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-14 11:30:54 +00:00
Joey Gouly
1175011a7d arm64: cpufeature: add HWCAP for FEAT_RPRES
Add a new HWCAP to detect the Increased precision of Reciprocal Estimate
and Reciprocal Square Root Estimate feature (FEAT_RPRES), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64ISAR2_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-4-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-13 18:53:00 +00:00
Joey Gouly
9e45365f14 arm64: add ID_AA64ISAR2_EL1 sys register
This is a new ID register, introduced in 8.7.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Reiji Watanabe <reijiw@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-3-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-13 18:53:00 +00:00
Joey Gouly
5c13f042e7 arm64: cpufeature: add HWCAP for FEAT_AFP
Add a new HWCAP to detect the Alternate Floating-point Behaviour
feature (FEAT_AFP), introduced in Armv8.7.

Also expose this to userspace in the ID_AA64MMFR1_EL1 feature register.

Signed-off-by: Joey Gouly <joey.gouly@arm.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211210165432.8106-2-joey.gouly@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-13 18:52:59 +00:00
Eric W. Biederman
0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Mark Rutland
d2d1d2645c arm64: Make some stacktrace functions private
Now that open-coded stack unwinds have been converted to
arch_stack_walk(), we no longer need to expose any of unwind_frame(),
walk_stackframe(), or start_backtrace() outside of stacktrace.c.

Make those functions private to stacktrace.c, removing their prototypes
from <asm/stacktrace.h> and marking them static.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-10-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:04 +00:00
Madhavan T. Venkataraman
2dad6dc17b arm64: Make dump_backtrace() use arch_stack_walk()
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic.

Currently, dump_backtrace() walks the stack of the current task or a
blocked task by calling stact_backtrace() and iterating unwind steps
using unwind_frame(). This can be written more simply in terms of
arch_stack_walk(), considering three distinct cases:

1) When unwinding a blocked task, start_backtrace() is called with the
   blocked task's saved PC and FP, and the unwind proceeds immediately
   from this point without skipping any entries. This is functionally
   equivalent to calling arch_stack_walk() with the blocked task, which
   will start with the task's saved PC and FP.

   There is no functional change to this case.

2) When unwinding the current task without regs, start_backtrace() is
   called with dump_backtrace() as the PC and __builtin_frame_address(0)
   as the next frame, and the unwind proceeds immediately without
   skipping. This is *almost* functionally equivalent to calling
   arch_stack_walk() for the current task, which will start with its
   caller (i.e. an offset into dump_backtrace()) as the PC, and the
   callers frame record as the next frame.

   The only difference being that dump_backtrace() will be reported with
   an offset (which is strictly more correct than currently). Otherwise
   there is no functional cahnge to this case.

3) When unwinding the current task with regs, start_backtrace() is
   called with dump_backtrace() as the PC and __builtin_frame_address(0)
   as the next frame, and the unwind is performed silently until the
   next frame is the frame pointed to by regs->fp. Reporting starts
   from regs->pc and continues from the frame in regs->fp.

   Historically, this pre-unwind was necessary to correctly record
   return addresses rewritten by the ftrace graph calller, but this is
   no longer necessary as these are now recovered using the FP since
   commit:

   c6d3cd32fd ("arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR")

   This pre-unwind is not necessary to recover return addresses
   rewritten by kretprobes, which historically were not recovered, and
   are now recovered using the FP since commit:

   cd9bc2c925 ("arm64: Recover kretprobe modified return address in stacktrace")

   Thus, this is functionally equivalent to calling arch_stack_walk()
   with the current task and regs, which will start with regs->pc as the
   PC and regs->fp as the next frame, without a pre-unwind.

This patch makes dump_backtrace() use arch_stack_walk(). This simplifies
dump_backtrace() and will permit subsequent changes to the unwind code.

Aside from the improved reporting when unwinding current without regs,
there should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
[Mark: elaborate commit message]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:04 +00:00
Madhavan T. Venkataraman
22ecd975b6 arm64: Make profile_pc() use arch_stack_walk()
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently profile_pc() walks the stack of an interrupted context by
calling start_backtrace() with the context's PC and FP, and iterating
unwind steps using walk_stackframe(). This is functionally equivalent to
calling arch_stack_walk() with the interrupted context's pt_regs, which
will start with the PC and FP from the regs.

Make profile_pc() use arch_stack_walk(). This simplifies profile_pc(),
and in future will alow us to make walk_stackframe() private to
stacktrace.c.

At the same time, we remove the early return for when regs->pc is not in
lock functions, as this will be handled by the first call to the
profile_pc_cb() callback.

There should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[Mark: remove early return, elaborate commit message, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-8-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:04 +00:00
Madhavan T. Venkataraman
39ef362d2d arm64: Make return_address() use arch_stack_walk()
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently return_address() walks the stack of the current task by
calling start_backtrace() with return_address as the PC and the frame
pointer of return_address() as the next frame, iterating unwind steps
using walk_stackframe(). This is functionally equivalent to calling
arch_stack_walk() for the current stack, which will start from its
caller (i.e. return_address()) as the PC and it's caller's frame record
as the next frame.

Make return_address() use arch_stackwalk(). This simplifies
return_address(), and in future will alow us to make walk_stackframe()
private to stacktrace.c.

There should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[Mark: elaborate commit message, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-7-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:04 +00:00
Madhavan T. Venkataraman
4f62bb7cb1 arm64: Make __get_wchan() use arch_stack_walk()
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently, __get_wchan() walks the stack of a blocked task by calling
start_backtrace() with the task's saved PC and FP values, and iterating
unwind steps using unwind_frame(). The initialization is functionally
equivalent to calling arch_stack_walk() with the blocked task, which
will start with the task's saved PC and FP values.

Currently __get_wchan() always performs an initial unwind step, which
will stkip __switch_to(), but as this is now marked as a __sched
function, this no longer needs special handling and will be skipped in
the same way as other sched functions.

Make __get_wchan() use arch_stack_walk(). This simplifies __get_wchan(),
and in future will alow us to make unwind_frame() private to
stacktrace.c. At the same time, we can simplify the try_get_task_stack()
check and avoid the unnecessary `stack_page` variable.

The change to the skipping logic means we may terminate one frame
earlier than previously where there are an excessive number of sched
functions in the trace, but this isn't seen in practice, and wchan is
best-effort anyway, so this should not be a problem.

Other than the above, there should be no functional change as a result
of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
[Mark: rebase atop wchan changes, elaborate commit message, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-6-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:04 +00:00
Madhavan T. Venkataraman
ed876d35a1 arm64: Make perf_callchain_kernel() use arch_stack_walk()
To enable RELIABLE_STACKTRACE and LIVEPATCH on arm64, we need to
substantially rework arm64's unwinding code. As part of this, we want to
minimize the set of unwind interfaces we expose, and avoid open-coding
of unwind logic outside of stacktrace.c.

Currently perf_callchain_kernel() walks the stack of an interrupted
context by calling start_backtrace() with the context's PC and FP, and
iterating unwind steps using walk_stackframe(). This is functionally
equivalent to calling arch_stack_walk() with the interrupted context's
pt_regs, which will start with the PC and FP from the regs.

Make perf_callchain_kernel() use arch_stack_walk(). This simplifies
perf_callchain_kernel(), and in future will alow us to make
walk_stackframe() private to stacktrace.c.

At the same time, we update the callchain_trace() callback to check the
return value of perf_callchain_store(), which indicates whether there is
space for any further entries. When a non-zero value is returned,
further calls will be ignored, and are redundant, so we can stop the
unwind at this point.

We also remove the stale and confusing comment for callchain_trace.

There should be no functional change as a result of this patch.

Signed-off-by: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
[Mark: elaborate commit message, remove comment, fix includes]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
Mark Rutland
86bcbafcb7 arm64: Mark __switch_to() as __sched
Unlike most architectures (and only in keeping with powerpc), arm64 has
a non __sched() function on the path to our cpu_switch_to() assembly
function.

It is expected that for a blocked task, in_sched_functions() can be used
to skip all functions between the raw context switch assembly and the
scheduler functions that call into __switch_to(). This is the behaviour
expected by stack_trace_consume_entry_nosched(), and the behaviour we'd
like to have such that we an simplify arm64's __get_wchan()
implementation to use arch_stack_walk().

This patch mark's arm64's __switch_to as __sched. This *will not* change
the behaviour of arm64's current __get_wchan() implementation, which
always performs an initial unwind step which skips __switch_to(). This
*will* change the behaviour of stack_trace_consume_entry_nosched() and
stack_trace_save_tsk() to match their expected behaviour on blocked
tasks, skipping all scheduler-internal functions including
__switch_to().

Other than the above, there should be no functional change as a result
of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129142849.3056714-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
Peter Zijlstra
1614b2b11f arch: Make ARCH_STACKWALK independent of STACKTRACE
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
Rongwei Wang
7afccde389 arm64: kexec: reduce calls to page_address()
In kexec_page_alloc(), page_address() is called twice.
This patch add a new variable to help to reduce calls
to page_address().

Signed-off-by: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211125170600.1608-3-rongwei.wang@linux.alibaba.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 13:57:25 +00:00
Jakub Kicinski
3150a73366 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-09 13:23:02 -08:00
Marc Zyngier
142ff9bddb KVM: arm64: Drop unused workaround_flags vcpu field
workaround_flags is a leftover from our earlier Spectre-v4 workaround
implementation, and now serves no purpose.

Get rid of the field and the corresponding asm-offset definition.

Fixes: 29e8910a56 ("KVM: arm64: Simplify handling of ARCH_WORKAROUND_2")
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-08 14:54:07 +00:00
Florian Fainelli
c9f5ea08a0 arm64: entry: Use SDEI event constants
Use SDEI_EV_FAILED instead of open coding the 1 to make it clearer how
SDEI_EVENT_COMPLETE vs. SDEI_EVENT_COMPLETE_AND_RESUME is selected.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20211118201811.2974922-1-f.fainelli@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-06 16:27:51 +00:00
Rob Herring
b6363fe7b5 arm64: Simplify checking for populated DT
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework dt_scan_depth1_nodes to
use libfdt calls directly, and rename it to dt_is_stub() to reflect
exactly what it checking.

Cc: Will Deacon <will@kernel.org>
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20211029144055.2365814-1-robh@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-06 16:12:53 +00:00
Mark Rutland
35b6b28e69 arm64: ftrace: add missing BTIs
When branch target identifiers are in use, code reachable via an
indirect branch requires a BTI landing pad at the branch target site.

When building FTRACE_WITH_REGS atop patchable-function-entry, we miss
BTIs at the start start of the `ftrace_caller` and `ftrace_regs_caller`
trampolines, and when these are called from a module via a PLT (which
will use a `BR X16`), we will encounter a BTI failure, e.g.

| # insmod lkdtm.ko
| lkdtm: No crash points registered, enable through debugfs
| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # cat /sys/kernel/debug/provoke-crash/DIRECT
| Unhandled 64-bit el1h sync exception on CPU0, ESR 0x34000001 -- BTI
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400405 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=jc)
| pc : ftrace_caller+0x0/0x3c
| lr : lkdtm_debugfs_open+0xc/0x20 [lkdtm]
| sp : ffff800012e43b00
| x29: ffff800012e43b00 x28: 0000000000000000 x27: ffff800012e43c88
| x26: 0000000000000000 x25: 0000000000000000 x24: ffff0000c171f200
| x23: ffff0000c27b1e00 x22: ffff0000c2265240 x21: ffff0000c23c8c30
| x20: ffff8000090ba380 x19: 0000000000000000 x18: 0000000000000000
| x17: 0000000000000000 x16: ffff80001002bb4c x15: 0000000000000000
| x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000900ff0
| x11: ffff0000c4166310 x10: ffff800012e43b00 x9 : ffff8000104f2384
| x8 : 0000000000000001 x7 : 0000000000000000 x6 : 000000000000003f
| x5 : 0000000000000040 x4 : ffff800012e43af0 x3 : 0000000000000001
| x2 : ffff8000090b0000 x1 : ffff0000c171f200 x0 : ffff0000c23c8c30
| Kernel panic - not syncing: Unhandled exception
| CPU: 0 PID: 174 Comm: cat Not tainted 5.16.0-rc2-dirty #3
| Hardware name: linux,dummy-virt (DT)
| Call trace:
|  dump_backtrace+0x0/0x1a4
|  show_stack+0x24/0x30
|  dump_stack_lvl+0x68/0x84
|  dump_stack+0x1c/0x38
|  panic+0x168/0x360
|  arm64_exit_nmi.isra.0+0x0/0x80
|  el1h_64_sync_handler+0x68/0xd4
|  el1h_64_sync+0x78/0x7c
|  ftrace_caller+0x0/0x3c
|  do_dentry_open+0x134/0x3b0
|  vfs_open+0x38/0x44
|  path_openat+0x89c/0xe40
|  do_filp_open+0x8c/0x13c
|  do_sys_openat2+0xbc/0x174
|  __arm64_sys_openat+0x6c/0xbc
|  invoke_syscall+0x50/0x120
|  el0_svc_common.constprop.0+0xdc/0x100
|  do_el0_svc+0x84/0xa0
|  el0_svc+0x28/0x80
|  el0t_64_sync_handler+0xa8/0x130
|  el0t_64_sync+0x1a0/0x1a4
| SMP: stopping secondary CPUs
| Kernel Offset: disabled
| CPU features: 0x0,00000f42,da660c5f
| Memory Limit: none
| ---[ end Kernel panic - not syncing: Unhandled exception ]---

Fix this by adding the required `BTI C`, as we only require these to be
reachable via BL for direct calls or BR X16/X17 for PLTs. For now, these
are open-coded in the function prologue, matching the style of the
`__hwasan_tag_mismatch` trampoline.

In future we may wish to consider adding a new SYM_CODE_START_*()
variant which has an implicit BTI.

When ftrace is built atop mcount, the trampolines are marked with
SYM_FUNC_START(), and so get an implicit BTI. We may need to change
these over to SYM_CODE_START() in future for RELIABLE_STACKTRACE, in
case we need to apply special care aroud the return address being
rewritten.

Fixes: 97fed779f2 ("arm64: bti: Provide Kconfig for kernel mode BTI")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211129135709.2274019-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-02 10:18:32 +00:00
Mark Rutland
2f2183243f arm64: kexec: use __pa_symbol(empty_zero_page)
In machine_kexec_post_load() we use __pa() on `empty_zero_page`, so that
we can use the physical address during arm64_relocate_new_kernel() to
switch TTBR1 to a new set of tables. While `empty_zero_page` is part of
the old kernel, we won't clobber it until after this switch, so using it
is benign.

However, `empty_zero_page` is part of the kernel image rather than a
linear map address, so it is not correct to use __pa(x), and we should
instead use __pa_symbol(x) or __pa(lm_alias(x)). Otherwise, when the
kernel is built with DEBUG_VIRTUAL, we'll encounter splats as below, as
I've seen when fuzzing v5.16-rc3 with Syzkaller:

| ------------[ cut here ]------------
| virt_to_phys used for non-linear address: 000000008492561a (empty_zero_page+0x0/0x1000)
| WARNING: CPU: 3 PID: 11492 at arch/arm64/mm/physaddr.c:15 __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| CPU: 3 PID: 11492 Comm: syz-executor.0 Not tainted 5.16.0-rc3-00001-g48bd452a045c #1
| Hardware name: linux,dummy-virt (DT)
| pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| lr : __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
| sp : ffff80001af17bb0
| x29: ffff80001af17bb0 x28: ffff1cc65207b400 x27: ffffb7828730b120
| x26: 0000000000000e11 x25: 0000000000000000 x24: 0000000000000001
| x23: ffffb7828963e000 x22: ffffb78289644000 x21: 0000600000000000
| x20: 000000000000002d x19: 0000b78289644000 x18: 0000000000000000
| x17: 74706d6528206131 x16: 3635323934383030 x15: 303030303030203a
| x14: 1ffff000035e2eb8 x13: ffff6398d53f4f0f x12: 1fffe398d53f4f0e
| x11: 1fffe398d53f4f0e x10: ffff6398d53f4f0e x9 : ffffb7827c6f76dc
| x8 : ffff1cc6a9fa7877 x7 : 0000000000000001 x6 : ffff6398d53f4f0f
| x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff1cc66f2a99c0
| x2 : 0000000000040000 x1 : d7ce7775b09b5d00 x0 : 0000000000000000
| Call trace:
|  __virt_to_phys+0x120/0x1c0 arch/arm64/mm/physaddr.c:12
|  machine_kexec_post_load+0x284/0x670 arch/arm64/kernel/machine_kexec.c:150
|  do_kexec_load+0x570/0x670 kernel/kexec.c:155
|  __do_sys_kexec_load kernel/kexec.c:250 [inline]
|  __se_sys_kexec_load kernel/kexec.c:231 [inline]
|  __arm64_sys_kexec_load+0x1d8/0x268 kernel/kexec.c:231
|  __invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
|  invoke_syscall+0x90/0x2e0 arch/arm64/kernel/syscall.c:52
|  el0_svc_common.constprop.2+0x1e4/0x2f8 arch/arm64/kernel/syscall.c:142
|  do_el0_svc+0xf8/0x150 arch/arm64/kernel/syscall.c:181
|  el0_svc+0x60/0x248 arch/arm64/kernel/entry-common.c:603
|  el0t_64_sync_handler+0x90/0xb8 arch/arm64/kernel/entry-common.c:621
|  el0t_64_sync+0x180/0x184 arch/arm64/kernel/entry.S:572
| irq event stamp: 2428
| hardirqs last  enabled at (2427): [<ffffb7827c6f2308>] __up_console_sem+0xf0/0x118 kernel/printk/printk.c:255
| hardirqs last disabled at (2428): [<ffffb7828223df98>] el1_dbg+0x28/0x80 arch/arm64/kernel/entry-common.c:375
| softirqs last  enabled at (2424): [<ffffb7827c411c00>] softirq_handle_end kernel/softirq.c:401 [inline]
| softirqs last  enabled at (2424): [<ffffb7827c411c00>] __do_softirq+0xa28/0x11e4 kernel/softirq.c:587
| softirqs last disabled at (2417): [<ffffb7827c59015c>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] invoke_softirq kernel/softirq.c:439 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] __irq_exit_rcu kernel/softirq.c:636 [inline]
| softirqs last disabled at (2417): [<ffffb7827c59015c>] irq_exit_rcu+0x53c/0x688 kernel/softirq.c:648
| ---[ end trace 0ca578534e7ca938 ]---

With or without DEBUG_VIRTUAL __pa() will fall back to __kimg_to_phys()
for non-linear addresses, and will happen to do the right thing in this
case, even with the warning. But we should not depend upon this, and to
keep the warning useful we should fix this case.

Fix this issue by using __pa_symbol(), which handles kernel image
addresses (and checks its input is a kernel image address). This matches
what we do elsewhere, e.g. in arch/arm64/include/asm/pgtable.h:

| #define ZERO_PAGE(vaddr)       phys_to_page(__pa_symbol(empty_zero_page))

Fixes: 3744b5280e ("arm64: kexec: install a copy of the linear-map")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Link: https://lore.kernel.org/r/20211130121849.3319010-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-12-02 10:17:12 +00:00
Marc Zyngier
2d761dbf7f Merge branch kvm-arm64/fpsimd-tracking into kvmarm-master/next
* kvm-arm64/fpsimd-tracking:
  : .
  : Simplify the handling of both the FP/SIMD and SVE state by
  : removing the need for mapping the thread at EL2, and by
  : dropping the tracking of the host's SVE state which is
  : always invalid by construction.
  : .
  arm64/fpsimd: Document the use of TIF_FOREIGN_FPSTATE by KVM
  KVM: arm64: Stop mapping current thread_info at EL2
  KVM: arm64: Introduce flag shadowing TIF_FOREIGN_FPSTATE
  KVM: arm64: Remove unused __sve_save_state
  KVM: arm64: Get rid of host SVE tracking/saving
  KVM: arm64: Reorder vcpu flag definitions

Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-12-01 12:13:44 +00:00
Mark Rutland
342b380878 arm64: Snapshot thread flags
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Generally this is unlikely to cause a
problem in practice, but it is somewhat unsound, and KCSAN will
legitimately warn that there is a data race.

To avoid such issues, a snapshot of the flags has to be taken prior to
using them. Some places already use READ_ONCE() for that, others do not.

Convert them all to the new flag accessor helpers.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20211129130653.2037928-7-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Jakub Kicinski
93d5404e89 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_main.c
  8afc7e471a ("net: ipa: separate disabling setup from modem stop")
  76b5fbcd6b ("net: ipa: kill ipa_modem_init()")

Duplicated include, drop one.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-26 13:45:19 -08:00
Marc Zyngier
31aa126de8 arm64/fpsimd: Document the use of TIF_FOREIGN_FPSTATE by KVM
The bit of documentation that talks about TIF_FOREIGN_FPSTATE
does not mention the ungodly tricks that KVM plays with this flag.

Try and document this for the posterity.

Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
2021-11-22 16:01:39 +00:00
Sean Christopherson
be399d824b KVM: arm64: Hide kvm_arm_pmu_available behind CONFIG_HW_PERF_EVENTS=y
Move the definition of kvm_arm_pmu_available to pmu-emul.c and, out of
"necessity", hide it behind CONFIG_HW_PERF_EVENTS.  Provide a stub for
the key's wrapper, kvm_arm_support_pmu_v3().  Moving the key's definition
out of perf.c will allow a future commit to delete perf.c entirely.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211111020738.2512932-16-seanjc@google.com
2021-11-17 14:49:11 +01:00
Sean Christopherson
1c3430516b perf: Add wrappers for invoking guest callbacks
Add helpers for the guest callbacks to prepare for burying the callbacks
behind a Kconfig (it's a lot easier to provide a few stubs than to #ifdef
piles of code), and also to prepare for converting the callbacks to
static_call().  perf_instruction_pointer() in particular will have subtle
semantics with static_call(), as the "no callbacks" case will return 0 if
the callbacks are unregistered between querying guest state and getting
the IP.  Implement the change now to avoid a functional change when adding
static_call() support, and because the new helper needs to return
_something_ in this case.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20211111020738.2512932-8-seanjc@google.com
2021-11-17 14:49:08 +01:00
Like Xu
b9f5621c95 perf/core: Rework guest callbacks to prepare for static_call support
To prepare for using static_calls to optimize perf's guest callbacks,
replace ->is_in_guest and ->is_user_mode with a new multiplexed hook
->state, tweak ->handle_intel_pt_intr to play nice with being called when
there is no active guest, and drop "guest" from ->get_guest_ip.

Return '0' from ->state and ->handle_intel_pt_intr to indicate "not in
guest" so that DEFINE_STATIC_CALL_RET0 can be used to define the static
calls, i.e. no callback == !guest.

[sean: extracted from static_call patch, fixed get_ip() bug, wrote changelog]
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Link: https://lore.kernel.org/r/20211111020738.2512932-7-seanjc@google.com
2021-11-17 14:49:07 +01:00
Sean Christopherson
ff083a2d97 perf: Protect perf_guest_cbs with RCU
Protect perf_guest_cbs with RCU to fix multiple possible errors.  Luckily,
all paths that read perf_guest_cbs already require RCU protection, e.g. to
protect the callback chains, so only the direct perf_guest_cbs touchpoints
need to be modified.

Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure
perf_guest_cbs isn't reloaded between a !NULL check and a dereference.
Fixed via the READ_ONCE() in rcu_dereference().

Bug #2 is that on weakly-ordered architectures, updates to the callbacks
themselves are not guaranteed to be visible before the pointer is made
visible to readers.  Fixed by the smp_store_release() in
rcu_assign_pointer() when the new pointer is non-NULL.

Bug #3 is that, because the callbacks are global, it's possible for
readers to run in parallel with an unregisters, and thus a module
implementing the callbacks can be unloaded while readers are in flight,
resulting in a use-after-free.  Fixed by a synchronize_rcu() call when
unregistering callbacks.

Bug #1 escaped notice because it's extremely unlikely a compiler will
reload perf_guest_cbs in this sequence.  perf_guest_cbs does get reloaded
for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest()
guard all but guarantees the consumer will win the race, e.g. to nullify
perf_guest_cbs, KVM has to completely exit the guest and teardown down
all VMs before KVM start its module unload / unregister sequence.  This
also makes it all but impossible to encounter bug #3.

Bug #2 has not been a problem because all architectures that register
callbacks are strongly ordered and/or have a static set of callbacks.

But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping
perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming
kvm_intel module load/unload leads to:

  BUG: kernel NULL pointer dereference, address: 0000000000000000
  #PF: supervisor read access in kernel mode
  #PF: error_code(0x0000) - not-present page
  PGD 0 P4D 0
  Oops: 0000 [#1] PREEMPT SMP
  CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  RIP: 0010:perf_misc_flags+0x1c/0x70
  Call Trace:
   perf_prepare_sample+0x53/0x6b0
   perf_event_output_forward+0x67/0x160
   __perf_event_overflow+0x52/0xf0
   handle_pmi_common+0x207/0x300
   intel_pmu_handle_irq+0xcf/0x410
   perf_event_nmi_handler+0x28/0x50
   nmi_handle+0xc7/0x260
   default_do_nmi+0x6b/0x170
   exc_nmi+0x103/0x130
   asm_exc_nmi+0x76/0xbf

Fixes: 39447b386c ("perf: Enhance perf to allow for guest statistic collection from host")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com
2021-11-17 14:49:06 +01:00
Mark Rutland
c6d3cd32fd arm64: ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
When CONFIG_FUNCTION_GRAPH_TRACER is selected and the function graph
tracer is in use, unwind_frame() may erroneously associate a traced
function with an incorrect return address. This can happen when starting
an unwind from a pt_regs, or when unwinding across an exception
boundary.

This can be seen when recording with perf while the function graph
tracer is in use. For example:

| # echo function_graph > /sys/kernel/debug/tracing/current_tracer
| # perf record -g -e raw_syscalls:sys_enter:k /bin/true
| # perf report

... reports the callchain erroneously as:

| el0t_64_sync
| el0t_64_sync_handler
| el0_svc_common.constprop.0
| perf_callchain
| get_perf_callchain
| syscall_trace_enter
| syscall_trace_enter

... whereas when the function graph tracer is not in use, it reports:

| el0t_64_sync
| el0t_64_sync_handler
| el0_svc
| do_el0_svc
| el0_svc_common.constprop.0
| syscall_trace_enter
| syscall_trace_enter

The underlying problem is that ftrace_graph_get_ret_stack() takes an
index offset from the most recent entry added to the fgraph return
stack. We start an unwind at offset 0, and increment the offset each
time we encounter a rewritten return address (i.e. when we see
`return_to_handler`). This is broken in two cases:

1) Between creating a pt_regs and starting the unwind, function calls
   may place entries on the stack, leaving an arbitrary offset which we
   can only determine by performing a full unwind from the caller of the
   unwind code (and relying on none of the unwind code being
   instrumented).

   This can result in erroneous entries being reported in a backtrace
   recorded by perf or kfence when the function graph tracer is in use.
   Currently show_regs() is unaffected as dump_backtrace() performs an
   initial unwind.

2) When unwinding across an exception boundary (whether continuing an
   unwind or starting a new unwind from regs), we currently always skip
   the LR of the interrupted context. Where this was live and contained
   a rewritten address, we won't consume the corresponding fgraph ret
   stack entry, leaving subsequent entries off-by-one.

   This can result in erroneous entries being reported in a backtrace
   performed by any in-kernel unwinder when that backtrace crosses an
   exception boundary, with entries after the boundary being reported
   incorrectly. This includes perf, kfence, show_regs(), panic(), etc.

To fix this, we need to be able to uniquely identify each rewritten
return address such that we can map this back to the original return
address. We can use HAVE_FUNCTION_GRAPH_RET_ADDR_PTR to associate
each rewritten return address with a unique location on the stack. As
the return address is passed in the LR (and so is not guaranteed a
unique location in memory), we use the FP upon entry to the function
(i.e. the address of the caller's frame record) as the return address
pointer. Any nested call will have a different FP value as the caller
must create its own frame record and update FP to point to this.

Since ftrace_graph_ret_addr() requires the return address with the PAC
stripped, the stripping of the PAC is moved before the fixup of the
rewritten address. As we would unconditionally strip the PAC, moving
this earlier is not harmful, and we can avoid a redundant strip in the
return address fixup code.

I've tested this with the perf case above, the ftrace selftests, and
a number of ad-hoc unwinder tests. The tests all pass, and I have seen
no unexpected behaviour as a result of this change. I've tested with
pointer authentication under QEMU TCG where magic-sysrq+l correctly
recovers the original return addresses.

Note that this doesn't fix the issue of skipping a live LR at an
exception boundary, which is a more general problem and requires more
substantial rework. Were we to consume the LR in all cases this would
result in warnings where the interrupted context's LR contains
`return_to_handler`, but the FP has been altered, e.g.

| func:
|	<--- ftrace entry ---> 	// logs FP & LR, rewrites LR
| 	STP	FP, LR, [SP, #-16]!
| 	MOV	FP, SP
| 	<--- INTERRUPT --->

... as ftrace_graph_get_ret_stack() fill not find a matching entry,
triggering the WARN_ON_ONCE() in unwind_frame().

Link: https://lore.kernel.org/r/20211025164925.GB2001@C02TD0UTHF1T.local
Link: https://lore.kernel.org/r/20211027132529.30027-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20211029162245.39761-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-16 09:47:54 +00:00
Jakub Kicinski
a5bdc36354 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-11-15

We've added 72 non-merge commits during the last 13 day(s) which contain
a total of 171 files changed, 2728 insertions(+), 1143 deletions(-).

The main changes are:

1) Add btf_type_tag attributes to bring kernel annotations like __user/__rcu to
   BTF such that BPF verifier will be able to detect misuse, from Yonghong Song.

2) Big batch of libbpf improvements including various fixes, future proofing APIs,
   and adding a unified, OPTS-based bpf_prog_load() low-level API, from Andrii Nakryiko.

3) Add ingress_ifindex to BPF_SK_LOOKUP program type for selectively applying the
   programmable socket lookup logic to packets from a given netdev, from Mark Pashmfouroush.

4) Remove the 128M upper JIT limit for BPF programs on arm64 and add selftest to
   ensure exception handling still works, from Russell King and Alan Maguire.

5) Add a new bpf_find_vma() helper for tracing to map an address to the backing
   file such as shared library, from Song Liu.

6) Batch of various misc fixes to bpftool, fixing a memory leak in BPF program dump,
   updating documentation and bash-completion among others, from Quentin Monnet.

7) Deprecate libbpf bpf_program__get_prog_info_linear() API and migrate its users as
   the API is heavily tailored around perf and is non-generic, from Dave Marchevsky.

8) Enable libbpf's strict mode by default in bpftool and add a --legacy option as an
   opt-out for more relaxed BPF program requirements, from Stanislav Fomichev.

9) Fix bpftool to use libbpf_get_error() to check for errors, from Hengqi Chen.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (72 commits)
  bpftool: Use libbpf_get_error() to check error
  bpftool: Fix mixed indentation in documentation
  bpftool: Update the lists of names for maps and prog-attach types
  bpftool: Fix indent in option lists in the documentation
  bpftool: Remove inclusion of utilities.mak from Makefiles
  bpftool: Fix memory leak in prog_dump()
  selftests/bpf: Fix a tautological-constant-out-of-range-compare compiler warning
  selftests/bpf: Fix an unused-but-set-variable compiler warning
  bpf: Introduce btf_tracing_ids
  bpf: Extend BTF_ID_LIST_GLOBAL with parameter for number of IDs
  bpftool: Enable libbpf's strict mode by default
  docs/bpf: Update documentation for BTF_KIND_TYPE_TAG support
  selftests/bpf: Clarify llvm dependency with btf_tag selftest
  selftests/bpf: Add a C test for btf_type_tag
  selftests/bpf: Rename progs/tag.c to progs/btf_decl_tag.c
  selftests/bpf: Test BTF_KIND_DECL_TAG for deduplication
  selftests/bpf: Add BTF_KIND_TYPE_TAG unit tests
  selftests/bpf: Test libbpf API function btf__add_type_tag()
  bpftool: Support BTF_KIND_TYPE_TAG
  libbpf: Support BTF_KIND_TYPE_TAG
  ...
====================

Link: https://lore.kernel.org/r/20211115162008.25916-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-15 08:49:23 -08:00
Linus Torvalds
c55a04176c Merge tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc fix from Greg KH:
 "Here is a single fix for 5.16-rc1 to resolve a build problem that came
  in through the coresight tree (and as such came in through the
  char/misc tree merge in the 5.16-rc1 merge window).

  It resolves a build problem with 'allmodconfig' on arm64 and is acked
  by the proper subsystem maintainers. It has been in linux-next all
  week with no reported problems"

* tag 'char-misc-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  arm64: cpufeature: Export this_cpu_has_cap helper
2021-11-11 09:44:29 -08:00
Linus Torvalds
89fa0be0a0 Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:

 - Fix double-evaluation of 'pte' macro argument when using 52-bit PAs

 - Fix signedness of some MTE prctl PR_* constants

 - Fix kmemleak memory usage by skipping early pgtable allocations

 - Fix printing of CPU feature register strings

 - Remove redundant -nostdlib linker flag for vDSO binaries

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: pgtable: make __pte_to_phys/__phys_to_pte_val inline functions
  arm64: Track no early_pgtable_alloc() for kmemleak
  arm64: mte: change PR_MTE_TCF_NONE back into an unsigned long
  arm64: vdso: remove -nostdlib compiler flag
  arm64: arm64_ftr_reg->name may not be a human-readable string
2021-11-10 11:29:30 -08:00
Russell King
b89ddf4cca arm64/bpf: Remove 128MB limit for BPF JIT programs
Commit 91fc957c9b ("arm64/bpf: don't allocate BPF JIT programs in module
memory") restricts BPF JIT program allocation to a 128MB region to ensure
BPF programs are still in branching range of each other. However this
restriction should not apply to the aarch64 JIT, since BPF_JMP | BPF_CALL
are implemented as a 64-bit move into a register and then a BLR instruction -
which has the effect of being able to call anything without proximity
limitation.

The practical reason to relax this restriction on JIT memory is that 128MB of
JIT memory can be quickly exhausted, especially where PAGE_SIZE is 64KB - one
page is needed per program. In cases where seccomp filters are applied to
multiple VMs on VM launch - such filters are classic BPF but converted to
BPF - this can severely limit the number of VMs that can be launched. In a
world where we support BPF JIT always on, turning off the JIT isn't always an
option either.

Fixes: 91fc957c9b ("arm64/bpf: don't allocate BPF JIT programs in module memory")
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <russell.king@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/1636131046-5982-2-git-send-email-alan.maguire@oracle.com
2021-11-08 22:16:26 +01:00
Linus Torvalds
1e9ed9360f Merge tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:

 - Remove the global -isystem compiler flag, which was made possible by
   the introduction of <linux/stdarg.h>

 - Improve the Kconfig help to print the location in the top menu level

 - Fix "FORCE prerequisite is missing" build warning for sparc

 - Add new build targets, tarzst-pkg and perf-tarzst-src-pkg, which
   generate a zstd-compressed tarball

 - Prevent gen_init_cpio tool from generating a corrupted cpio when
   KBUILD_BUILD_TIMESTAMP is set to 2106-02-07 or later

 - Misc cleanups

* tag 'kbuild-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (28 commits)
  kbuild: use more subdir- for visiting subdirectories while cleaning
  sh: remove meaningless archclean line
  initramfs: Check timestamp to prevent broken cpio archive
  kbuild: split DEBUG_CFLAGS out to scripts/Makefile.debug
  gen_init_cpio: add static const qualifiers
  kbuild: Add make tarzst-pkg build option
  scripts: update the comments of kallsyms support
  sparc: Add missing "FORCE" target when using if_changed
  kconfig: refactor conf_touch_dep()
  kconfig: refactor conf_write_dep()
  kconfig: refactor conf_write_autoconf()
  kconfig: add conf_get_autoheader_name()
  kconfig: move sym_escape_string_value() to confdata.c
  kconfig: refactor listnewconfig code
  kconfig: refactor conf_write_symbol()
  kconfig: refactor conf_write_heading()
  kconfig: remove 'const' from the return type of sym_escape_string_value()
  kconfig: rename a variable in the lexer to a clearer name
  kconfig: narrow the scope of variables in the lexer
  kconfig: Create links to main menu items in search
  ...
2021-11-08 09:15:45 -08:00
Masahiro Yamada
34688c7691 arm64: vdso: remove -nostdlib compiler flag
The -nostdlib option requests the compiler to not use the standard
system startup files or libraries when linking. It is effective only
when $(CC) is used as a linker driver.

Since commit 691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC)
to link VDSO"), $(LD) is directly used, hence -nostdlib is unneeded.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20211107161802.323125-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:02:57 +00:00
Reiji Watanabe
9dc232a8ab arm64: arm64_ftr_reg->name may not be a human-readable string
The id argument of ARM64_FTR_REG_OVERRIDE() is used for two purposes:
one as the system register encoding (used for the sys_id field of
__ftr_reg_entry), and the other as the register name (stringified
and used for the name field of arm64_ftr_reg), which is debug
information. The id argument is supposed to be a macro that
indicates an encoding of the register (eg. SYS_ID_AA64PFR0_EL1, etc).

ARM64_FTR_REG(), which also has the same id argument,
uses ARM64_FTR_REG_OVERRIDE() and passes the id to the macro.
Since the id argument is completely macro-expanded before it is
substituted into a macro body of ARM64_FTR_REG_OVERRIDE(),
the stringified id in the body of ARM64_FTR_REG_OVERRIDE is not
a human-readable register name, but a string of numeric bitwise
operations.

Fix this so that human-readable register names are available as
debug information.

Fixes: 8f266a5d87 ("arm64: cpufeature: Add global feature override facility")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20211101045421.2215822-1-reijiw@google.com
Signed-off-by: Will Deacon <will@kernel.org>
2021-11-08 10:02:36 +00:00
Greg Kroah-Hartman
a4119be437 Merge tag 'coresight-fixes-v5.16' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux into char-misc-linus
Mathieu writes:

coresight: Fix for v5.16

This fix enables to compile the TRBE driver as a module by
exporting function this_cpu_has_cap().

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>

* tag 'coresight-fixes-v5.16' of gitolite.kernel.org:pub/scm/linux/kernel/git/coresight/linux:
  arm64: cpufeature: Export this_cpu_has_cap helper
2021-11-07 11:49:35 +01:00