This marks more files and functions that can possibly be called in
real mode as not to be instrumented by KASAN. Most were found by
inspection, except for get_pseries_errorlog() which was reported as
causing a crash in testing.
Reported-by: Nageswara R Sastry <rnsastry@linux.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoX1kZPnmUX4RZEK@cleo
Implement a limited form of KASAN for Book3S 64-bit machines running under
the Radix MMU, supporting only outline mode.
- Enable the compiler instrumentation to check addresses and maintain the
shadow region. (This is the guts of KASAN which we can easily reuse.)
- Require kasan-vmalloc support to handle modules and anything else in
vmalloc space.
- KASAN needs to be able to validate all pointer accesses, but we can't
instrument all kernel addresses - only linear map and vmalloc. On boot,
set up a single page of read-only shadow that marks all iomap and
vmemmap accesses as valid.
- Document KASAN in powerpc docs.
Background
----------
KASAN support on Book3S is a bit tricky to get right:
- It would be good to support inline instrumentation so as to be able to
catch stack issues that cannot be caught with outline mode.
- Inline instrumentation requires a fixed offset.
- Book3S runs code with translations off ("real mode") during boot,
including a lot of generic device-tree parsing code which is used to
determine MMU features.
[ppc64 mm note: The kernel installs a linear mapping at effective
address c000...-c008.... This is a one-to-one mapping with physical
memory from 0000... onward. Because of how memory accesses work on
powerpc 64-bit Book3S, a kernel pointer in the linear map accesses the
same memory both with translations on (accessing as an 'effective
address'), and with translations off (accessing as a 'real
address'). This works in both guests and the hypervisor. For more
details, see s5.7 of Book III of version 3 of the ISA, in particular
the Storage Control Overview, s5.7.3, and s5.7.5 - noting that this
KASAN implementation currently only supports Radix.]
- Some code - most notably a lot of KVM code - also runs with translations
off after boot.
- Therefore any offset has to point to memory that is valid with
translations on or off.
One approach is just to give up on inline instrumentation. This way
boot-time checks can be delayed until after the MMU is set is up, and we
can just not instrument any code that runs with translations off after
booting. Take this approach for now and require outline instrumentation.
Previous attempts allowed inline instrumentation. However, they came with
some unfortunate restrictions: only physically contiguous memory could be
used and it had to be specified at compile time. Maybe we can do better in
the future.
[paulus@ozlabs.org - Rebased onto 5.17. Note that a kernel with
CONFIG_KASAN=y will crash during boot on a machine using HPT
translation because not all the entry points to the generic
KASAN code are protected with a call to kasan_arch_is_ready().]
Originally-by: Balbir Singh <bsingharora@gmail.com> # ppc64 out-of-line radix version
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Update copyright year and comment formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/YoTE69OQwiG7z+Gu@cleo
Pull powerpc updates from Michael Ellerman:
- Optimise radix KVM guest entry/exit by 2x on Power9/Power10.
- Allow firmware to tell us whether to disable the entry and uaccess
flushes on Power10 or later CPUs.
- Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits.
- Several fixes and improvements to our hard lockup watchdog.
- Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit.
- Allow building the 64-bit Book3S kernel without hash MMU support, ie.
Radix only.
- Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit).
- Add new encodings for perf_mem_data_src.mem_hops field, and use them
on Power10.
- A series of small performance improvements to 64-bit interrupt entry.
- Several commits fixing issues when building with the clang integrated
assembler.
- Many other small features and fixes.
Thanks to Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell,
Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christophe JAILLET,
Christophe Leroy, Christoph Hellwig, Daniel Axtens, David Yang, Erhard
Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren, Hari Bathini, Jason
Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour,
Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh
Kamboju, Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child,
Oliver O'Halloran, Peiwei Hu, Randy Dunlap, Ravi Bangoria, Rob Herring,
Russell Currey, Sachin Sant, Sean Christopherson, Segher Boessenkool,
Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang wangx, and Yang
Guang.
* tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (240 commits)
powerpc/xmon: Dump XIVE information for online-only processors.
powerpc/opal: use default_groups in kobj_type
powerpc/cacheinfo: use default_groups in kobj_type
powerpc/sched: Remove unused TASK_SIZE_OF
powerpc/xive: Add missing null check after calling kmalloc
powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API
selftests/powerpc: Add a test of sigreturning to an unaligned address
powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
powerpc/64s: Mask NIP before checking against SRR0
powerpc/perf: Fix spelling of "its"
powerpc/32: Fix boot failure with GCC latent entropy plugin
powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests
powerpc/code-patching: Move code patching selftests in its own file
powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h
powerpc/code-patching: Move patch_exception() outside code-patching.c
powerpc/code-patching: Use test_trampoline for prefixed patch test
powerpc/code-patching: Fix patch_branch() return on out-of-range failure
powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling
powerpc/code-patching: Fix unmap_patch_area() error handling
powerpc/code-patching: Fix error handling in do_patch_instruction()
...
This reverts commit 566af8cda3.
This caused some conflicts vs the audit tree, and the audit maintainers
would prefer we postpone this to the next merge window so we have more
time for testing.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Add support for out-of-line static calls on PPC32. This change
improve performance of calls to global function pointers by
using direct calls instead of indirect calls.
The trampoline is initialy populated with a 'blr' or branch to target,
followed by an unreachable long jump sequence.
In order to cater with parallele execution, the trampoline needs to
be updated in a way that ensures it remains consistent at all time.
This means we can't use the traditional lis/addi to load r12 with
the target address, otherwise there would be a window during which
the first instruction contains the upper part of the new target
address while the second instruction still contains the lower part of
the old target address. To avoid that the target address is stored
just after the 'bctr' and loaded from there with a single instruction.
Then, depending on the target distance, arch_static_call_transform()
will either replace the first instruction by a direct 'bl <target>' or
'nop' in order to have the trampoline fall through the long jump
sequence.
For the special case of __static_call_return0(), to avoid the risk of
a far branch, a version of it is inlined at the end of the trampoline.
Performancewise the long jump sequence is probably not better than
the indirect calls set by GCC when we don't use static calls, but
such calls are unlikely to be required on powerpc32: With most
configurations the kernel size is far below 32 Mbytes so only
modules may happen to be too far. And even modules are likely to
be close enough as they are allocated below the kernel core and
as close as possible of the kernel text.
static_call selftest is running successfully with this change.
With this patch, __do_irq() has the following sequence to trace
irq entries:
c0004a00 <__SCT__tp_func_irq_entry>:
c0004a00: 48 00 00 e0 b c0004ae0 <__traceiter_irq_entry>
c0004a04: 3d 80 c0 00 lis r12,-16384
c0004a08: 81 8c 4a 1c lwz r12,18972(r12)
c0004a0c: 7d 89 03 a6 mtctr r12
c0004a10: 4e 80 04 20 bctr
c0004a14: 38 60 00 00 li r3,0
c0004a18: 4e 80 00 20 blr
c0004a1c: 00 00 00 00 .long 0x0
...
c0005654 <__do_irq>:
...
c0005664: 7c 7f 1b 78 mr r31,r3
...
c00056a0: 81 22 00 00 lwz r9,0(r2)
c00056a4: 39 29 00 01 addi r9,r9,1
c00056a8: 91 22 00 00 stw r9,0(r2)
c00056ac: 3d 20 c0 af lis r9,-16209
c00056b0: 81 29 74 cc lwz r9,29900(r9)
c00056b4: 2c 09 00 00 cmpwi r9,0
c00056b8: 41 82 00 10 beq c00056c8 <__do_irq+0x74>
c00056bc: 80 69 00 04 lwz r3,4(r9)
c00056c0: 7f e4 fb 78 mr r4,r31
c00056c4: 4b ff f3 3d bl c0004a00 <__SCT__tp_func_irq_entry>
Before this patch, __do_irq() was doing the following to trace irq
entries:
c0005700 <__do_irq>:
...
c0005710: 7c 7e 1b 78 mr r30,r3
...
c000574c: 93 e1 00 0c stw r31,12(r1)
c0005750: 81 22 00 00 lwz r9,0(r2)
c0005754: 39 29 00 01 addi r9,r9,1
c0005758: 91 22 00 00 stw r9,0(r2)
c000575c: 3d 20 c0 af lis r9,-16209
c0005760: 83 e9 f4 cc lwz r31,-2868(r9)
c0005764: 2c 1f 00 00 cmpwi r31,0
c0005768: 41 82 00 24 beq c000578c <__do_irq+0x8c>
c000576c: 81 3f 00 00 lwz r9,0(r31)
c0005770: 80 7f 00 04 lwz r3,4(r31)
c0005774: 7d 29 03 a6 mtctr r9
c0005778: 7f c4 f3 78 mr r4,r30
c000577c: 4e 80 04 21 bctrl
c0005780: 85 3f 00 0c lwzu r9,12(r31)
c0005784: 2c 09 00 00 cmpwi r9,0
c0005788: 40 82 ff e4 bne c000576c <__do_irq+0x6c>
Behind the fact of now using a direct 'bl' instead of a
'load/mtctr/bctr' sequence, we can also see that we get one less
register on the stack.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu
Pull powerpc updates from Michael Ellerman:
- A large series adding wrappers for our interrupt handlers, so that
irq/nmi/user tracking can be isolated in the wrappers rather than
spread in each handler.
- Conversion of the 32-bit syscall handling into C.
- A series from Nick to streamline our TLB flushing when using the
Radix MMU.
- Switch to using queued spinlocks by default for 64-bit server CPUs.
- A rework of our PCI probing so that it happens later in boot, when
more generic infrastructure is available.
- Two small fixes to allow 32-bit little-endian processes to run on
64-bit kernels.
- Other smaller features, fixes & cleanups.
Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh
Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang
Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian
Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong,
Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan
Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu,
Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen
Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun.
* tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits)
powerpc/perf: Adds support for programming of Thresholding in P10
powerpc/pci: Remove unimplemented prototypes
powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user()
powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size()
powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user()
powerpc/64: Fix stack trace not displaying final frame
powerpc/time: Remove get_tbl()
powerpc/time: Avoid using get_tbl()
spi: mpc52xx: Avoid using get_tbl()
powerpc/syscall: Avoid storing 'current' in another pointer
powerpc/32: Handle bookE debugging in C in syscall entry/exit
powerpc/syscall: Do not check unsupported scv vector on PPC32
powerpc/32: Remove the counter in global_dbcr0
powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry
powerpc/syscall: implement system call entry/exit logic in C for PPC32
powerpc/32: Always save non volatile GPRs at syscall entry
powerpc/syscall: Change condition to check MSR_RI
powerpc/syscall: Save r3 in regs->orig_r3
powerpc/syscall: Use is_compat_task()
powerpc/syscall: Make interrupt.c buildable on PPC32
...
vgettimeofday.o is unnecessarily rebuilt. Adding it to 'targets' is not
enough to fix the issue. Kbuild is correctly rebuilding it because the
command line is changed.
PowerPC builds each vdso directory twice; first in vdso_prepare to
generate vdso{32,64}-offsets.h, second as part of the ordinary build
process to embed vdso{32,64}.so.dbg into the kernel.
The problem shows up when CONFIG_PPC_WERROR=y due to the following line
in arch/powerpc/Kbuild:
subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
In the preparation stage, Kbuild directly visits the vdso directories,
hence it does not inherit subdir-ccflags-y. In the second descend,
Kbuild adds -Werror, which results in the command line flipping
with/without -Werror.
It implies a potential danger; if a more critical flag that would impact
the resulted vdso, the offsets recorded in the headers might be different
from real offsets in the embedded vdso images.
Removing the unneeded second descend solves the problem.
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/linuxppc-dev/87tuslxhry.fsf@mpe.ellerman.id.au/
Link: https://lore.kernel.org/r/20201223171142.707053-1-masahiroy@kernel.org
Commit 7053f80d96 ("powerpc/64: Prevent stack protection in early
boot") introduced a couple of uses of __attribute__((optimize)) with
function scope, to disable the stack protector in some early boot
code.
Unfortunately, and this is documented in the GCC man pages [0],
overriding function attributes for optimization is broken, and is only
supported for debug scenarios, not for production: the problem appears
to be that setting GCC -f flags using this method will cause it to
forget about some or all other optimization settings that have been
applied.
So the only safe way to disable the stack protector is to disable it
for the entire source file.
[0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
Fixes: 7053f80d96 ("powerpc/64: Prevent stack protection in early boot")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
[mpe: Drop one remaining use of __nostackprotector, reported by snowpatch]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201028080433.26799-1-ardb@kernel.org
Unlike PPC64 which had a single head_64.S, PPC32 are multiple ones.
There is the head_32.S, selected by default based on the value of BITS
and overridden based on some CONFIG_ values. This leads to thinking
that it may be selected by different types of PPC32 platform but
indeed it ends up being selected by book3s/32 only.
Make that explicit by:
- Not doing any default selection based on BITS.
- Renaming head_32.S to head_book3s_32.S.
- Get head_book3s_32.S selected only by CONFIG_PPC_BOOK3S_32.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Fix head_$(BITS).o reference in arch/powerpc/Makefile]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/319d379f696412681c66a987cc75e6abf8f958d2.1601975100.git.christophe.leroy@csgroup.eu
Power10 hw has multiple DAWRs but hw doesn't tell which DAWR caused
the exception. So we have a sw logic to detect that in hw_breakpoint.c.
But hw_breakpoint.c gets compiled only with CONFIG_HAVE_HW_BREAKPOINT=Y.
Move DAWR detection logic outside of hw_breakpoint.c so that it can be
reused when CONFIG_HAVE_HW_BREAKPOINT is not set.
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200902042945.129369-5-ravi.bangoria@linux.ibm.com
Pull Kbuild updates from Masahiro Yamada:
- run the checker (e.g. sparse) after the compiler
- remove unneeded cc-option tests for old compiler flags
- fix tar-pkg to install dtbs
- introduce ccflags-remove-y and asflags-remove-y syntax
- allow to trace functions in sub-directories of lib/
- introduce hostprogs-always-y and userprogs-always-y syntax
- various Makefile cleanups
* tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base
kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled
kbuild: introduce hostprogs-always-y and userprogs-always-y
kbuild: sort hostprogs before passing it to ifneq
kbuild: move host .so build rules to scripts/gcc-plugins/Makefile
kbuild: Replace HTTP links with HTTPS ones
kbuild: trace functions in subdirectories of lib/
kbuild: introduce ccflags-remove-y and asflags-remove-y
kbuild: do not export LDFLAGS_vmlinux
kbuild: always create directories of targets
powerpc/boot: add DTB to 'targets'
kbuild: buildtar: add dtbs support
kbuild: remove cc-option test of -ffreestanding
kbuild: remove cc-option test of -fno-stack-protector
Revert "kbuild: Create directory for target DTB"
kbuild: run the checker after the compiler
KVM guests have certain restrictions and performance quirks when using
doorbells. This patch moves the EPAPR KVM guest test so it can be shared
with PSERIES, and uses that in doorbell setup code to apply the KVM
guest quirks and improves IPI performance for two cases:
- PowerVM guests may now use doorbells even if they are secure.
- KVM guests no longer use doorbells if XIVE is available.
There is a valid complaint that "KVM guest" is not a very reasonable
thing to test for, it's preferable for the hypervisor to advertise
particular behaviours to the guest so they could change if the
hypervisor implementation or configuration changes. However in this case
we were already assuming a KVM guest worst case, so this patch is about
containing those quirks. If KVM later advertises fast doorbells, we
should test for that and override the quirks.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200726035155.1424103-4-npiggin@gmail.com
Some Makefiles already pass -fno-stack-protector unconditionally.
For example, arch/arm64/kernel/vdso/Makefile, arch/x86/xen/Makefile.
No problem report so far about hard-coding this option. So, we can
assume all supported compilers know -fno-stack-protector.
GCC 4.8 and Clang support this option (https://godbolt.org/z/_HDGzN)
Get rid of cc-option from -fno-stack-protector.
Remove CONFIG_CC_HAS_STACKPROTECTOR_NONE, which is always 'y'.
Note:
arch/mips/vdso/Makefile adds -fno-stack-protector twice, first
unconditionally, and second conditionally. I removed the second one.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
System call entry and particularly exit code is beyond the limit of
what is reasonable to implement in asm.
This conversion moves all conditional branches out of the asm code,
except for the case that all GPRs should be restored at exit.
Null syscall test is about 5% faster after this patch, because the
exit work is handled under local_irq_disable, and the hard mask and
pending interrupt replay is handled after that, which avoids games
with MSR.
mpe: Includes subsequent fixes from Nick:
This fixes 4 issues caught by TM selftests. First was a tm-syscall bug
that hit due to tabort_syscall being called after interrupts were
reconciled (in a subsequent patch), which led to interrupts being
enabled before tabort_syscall was called. Rather than going through an
un-reconciling interrupts for the return, I just go back to putting
the test early in asm, the C-ification of that wasn't a big win
anyway.
Second is the syscall return _TIF_USER_WORK_MASK check would go into
an infinite loop if _TIF_RESTORE_TM became set. The asm code uses
_TIF_USER_WORK_MASK to brach to slowpath which includes
restore_tm_state.
Third is system call return was not calling restore_tm_state, I missed
this completely (alhtough it's in the return from interrupt C
conversion because when the asm syscall code encountered problems it
would branch to the interrupt return code.
Fourth is MSR_VEC missing from restore_math, which was caught by
tm-unavailable selftest taking an unexpected facility unavailable
interrupt when testing VSX unavailble exception with MSR.FP=1
MSR.VEC=1. Fourth case also has a fixup in a subsequent patch.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-26-npiggin@gmail.com
This implements the tricky tracing and soft irq handling bits in C,
leaving the low level bit to asm.
A functional difference is that this redirects the interrupt exit to
a return stub to execute blr, rather than the lr address itself. This
is probably barely measurable on real hardware, but it keeps the link
stack balanced.
Tested with QEMU.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move power4_fixup_nap back into exceptions-64s.S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711022404.18132-1-npiggin@gmail.com
LLVM revision r374662 gives LLVM the ability to convert certain loops
into a reference to bcmp as an optimization; this breaks
prom_init_check.sh:
CALL arch/powerpc/kernel/prom_init_check.sh
Error: External symbol 'bcmp' referenced from prom_init.c
make[2]: *** [arch/powerpc/kernel/Makefile:196: prom_init_check] Error 1
bcmp is defined in lib/string.c as a wrapper for memcmp so this could
be added to the whitelist. However, commit
450e7dd400 ("powerpc/prom_init: don't use string functions from
lib/") copied memcmp as prom_memcmp to avoid KASAN instrumentation so
having bcmp be resolved to regular memcmp would break that assumption.
Furthermore, because the compiler is the one that inserted bcmp, we
cannot provide something like prom_bcmp.
To prevent LLVM from being clever with optimizations like this, use
-ffreestanding to tell LLVM we are not hosted so it is not free to
make transformations like this.
Reviewed-by: Nick Desaulneris <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191119045712.39633-4-natechancellor@gmail.com
Merge the secureboot support, as well as the IMA changes needed to
support it.
From Nayna's cover letter:
In order to verify the OS kernel on PowerNV systems, secure boot
requires X.509 certificates trusted by the platform. These are
stored in secure variables controlled by OPAL, called OPAL secure
variables. In order to enable users to manage the keys, the secure
variables need to be exposed to userspace.
OPAL provides the runtime services for the kernel to be able to
access the secure variables. This patchset defines the kernel
interface for the OPAL APIs. These APIs are used by the hooks, which
load these variables to the keyring and expose them to the userspace
for reading/writing.
Overall, this patchset adds the following support:
* expose secure variables to the kernel via OPAL Runtime API interface
* expose secure variables to the userspace via kernel sysfs interface
* load kernel verification and revocation keys to .platform and
.blacklist keyring respectively.
The secure variables can be read/written using simple linux
utilities cat/hexdump.
For example:
Path to the secure variables is: /sys/firmware/secvar/vars
Each secure variable is listed as directory.
$ ls -l
total 0
drwxr-xr-x. 2 root root 0 Aug 20 21:20 db
drwxr-xr-x. 2 root root 0 Aug 20 21:20 KEK
drwxr-xr-x. 2 root root 0 Aug 20 21:20 PK
The attributes of each of the secure variables are (for example: PK):
$ ls -l
total 0
-r--r--r--. 1 root root 4096 Oct 1 15:10 data
-r--r--r--. 1 root root 65536 Oct 1 15:10 size
--w-------. 1 root root 4096 Oct 1 15:12 update
The "data" is used to read the existing variable value using
hexdump. The data is stored in ESL format. The "update" is used to
write a new value using cat. The update is to be submitted as AUTH
file.
PowerNV secure variables, which store the keys used for OS kernel
verification, are managed by the firmware. These secure variables need to
be accessed by the userspace for addition/deletion of the certificates.
This patch adds the sysfs interface to expose secure variables for PowerNV
secureboot. The users shall use this interface for manipulating
the keys stored in the secure variables.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-3-git-send-email-nayna@linux.ibm.com
The X.509 certificates trusted by the platform and required to secure
boot the OS kernel are wrapped in secure variables, which are
controlled by OPAL.
This patch adds firmware/kernel interface to read and write OPAL
secure variables based on the unique key.
This support can be enabled using CONFIG_OPAL_SECVAR.
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
[mpe: Make secvar_ops __ro_after_init, only build opal-secvar.c if PPC_SECURE_BOOT=y]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-2-git-send-email-nayna@linux.ibm.com
PowerNV systems use a Linux-based bootloader, which rely on the IMA
subsystem to enforce different secure boot modes. Since the
verification policy may differ based on the secure boot mode of the
system, the policies must be defined at runtime.
This patch implements arch-specific support to define IMA policy rules
based on the runtime secure boot mode of the system.
This patch provides arch-specific IMA policies if PPC_SECURE_BOOT
config is enabled.
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1572492694-6520-3-git-send-email-zohar@linux.ibm.com
Otherwise the build fails because prom_init is calling symbols it's
not allowed to, eg:
Error: External symbol 'ftrace_likely_update' referenced from prom_init.c
make[3]: *** [arch/powerpc/kernel/Makefile:197: arch/powerpc/kernel/prom_init_check] Error 1
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191106051129.7626-1-mpe@ellerman.id.au
Pull powerpc updates from Michael Ellerman:
"This is a bit late, partly due to me travelling, and partly due to a
power outage knocking out some of my test systems *while* I was
travelling.
- Initial support for running on a system with an Ultravisor, which
is software that runs below the hypervisor and protects guests
against some attacks by the hypervisor.
- Support for building the kernel to run as a "Secure Virtual
Machine", ie. as a guest capable of running on a system with an
Ultravisor.
- Some changes to our DMA code on bare metal, to allow devices with
medium sized DMA masks (> 32 && < 59 bits) to use more than 2GB of
DMA space.
- Support for firmware assisted crash dumps on bare metal (powernv).
- Two series fixing bugs in and refactoring our PCI EEH code.
- A large series refactoring our exception entry code to use gas
macros, both to make it more readable and also enable some future
optimisations.
As well as many cleanups and other minor features & fixups.
Thanks to: Adam Zerella, Alexey Kardashevskiy, Alistair Popple, Andrew
Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anshuman Khandual,
Balbir Singh, Benjamin Herrenschmidt, Cédric Le Goater, Christophe
JAILLET, Christophe Leroy, Christopher M. Riedl, Christoph Hellwig,
Claudio Carvalho, Daniel Axtens, David Gibson, David Hildenbrand,
Desnes A. Nunes do Rosario, Ganesh Goudar, Gautham R. Shenoy, Greg
Kurz, Guerney Hunt, Gustavo Romero, Halil Pasic, Hari Bathini, Joakim
Tjernlund, Jonathan Neuschafer, Jordan Niethe, Leonardo Bras, Lianbo
Jiang, Madhavan Srinivasan, Mahesh Salgaonkar, Mahesh Salgaonkar,
Masahiro Yamada, Maxiwell S. Garcia, Michael Anderson, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver
O'Halloran, Qian Cai, Ram Pai, Ravi Bangoria, Reza Arbab, Ryan Grimm,
Sam Bobroff, Santosh Sivaraj, Segher Boessenkool, Sukadev Bhattiprolu,
Thiago Bauermann, Thiago Jung Bauermann, Thomas Gleixner, Tom
Lendacky, Vasant Hegde"
* tag 'powerpc-5.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (264 commits)
powerpc/mm/mce: Keep irqs disabled during lockless page table walk
powerpc: Use ftrace_graph_ret_addr() when unwinding
powerpc/ftrace: Enable HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
ftrace: Look up the address of return_to_handler() using helpers
powerpc: dump kernel log before carrying out fadump or kdump
docs: powerpc: Add missing documentation reference
powerpc/xmon: Fix output of XIVE IPI
powerpc/xmon: Improve output of XIVE interrupts
powerpc/mm/radix: remove useless kernel messages
powerpc/fadump: support holes in kernel boot memory area
powerpc/fadump: remove RMA_START and RMA_END macros
powerpc/fadump: update documentation about option to release opalcore
powerpc/fadump: consider f/w load area
powerpc/opalcore: provide an option to invalidate /sys/firmware/opal/core file
powerpc/opalcore: export /sys/firmware/opal/core for analysing opal crashes
powerpc/fadump: update documentation about CONFIG_PRESERVE_FA_DUMP
powerpc/fadump: add support to preserve crash data on FADUMP disabled kernel
powerpc/fadump: improve how crashed kernel's memory is reserved
powerpc/fadump: consider reserved ranges while releasing memory
powerpc/fadump: make crash memory ranges array allocation generic
...
Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures
that crash data, from previously crash'ed kernel, is preserved. This
helps in cases where FADump is not enabled but the subsequent memory
preserving kernel boot is likely to process this crash data. One
typical usecase for this config option is petitboot kernel.
As OPAL allows registering address with it in the first kernel and
retrieving it after MPIPL, use it to store the top of boot memory.
A kernel that intends to preserve crash data retrieves it and avoids
using memory beyond this address.
Move arch_reserved_kernel_pages() function as it is needed for both
FA_DUMP and PRESERVE_FA_DUMP configurations.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821375751.5656.11459483669542541602.stgit@hbathini.in.ibm.com
This slightly improves the prom_init_check rule.
[1] Avoid needless check
Currently, prom_init_check.sh is invoked every time you run 'make'
even if you have changed nothing in prom_init.c. With this commit,
the script is re-run only when prom_init.o is recompiled.
[2] Beautify the build log
Currently, the O= build shows the absolute path to the script:
CALL /abs/path/to/source/of/linux/arch/powerpc/kernel/prom_init_check.sh
With this commit, it is always a relative path to the timestamp file:
PROMCHK arch/powerpc/kernel/prom_init_check
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190912074037.13813-1-yamada.masahiro@socionext.com
The ultracalls (ucalls for short) allow the Secure Virtual Machines
(SVM)s and hypervisor to request services from the ultravisor such as
accessing a register or memory region that can only be accessed when
running in ultravisor-privileged mode.
This patch adds the ucall_norets() ultravisor call handler.
The specific service needed from an ucall is specified in register
R3 (the first parameter to the ucall). Other parameters to the
ucall, if any, are specified in registers R4 through R12.
Return value of all ucalls is in register R3. Other output values
from the ucall, if any, are returned in registers R4 through R12.
Each ucall returns specific error codes, applicable in the context
of the ucall. However, like with the PowerPC Architecture Platform
Reference (PAPR), if no specific error code is defined for a particular
situation, then the ucall will fallback to an erroneous
parameter-position based code. i.e U_PARAMETER, U_P2, U_P3 etc depending
on the ucall parameter that may have caused the error.
Every host kernel (powernv) needs to be able to do ucalls in case it
ends up being run in a machine with ultravisor enabled. Otherwise, the
kernel may crash early in boot trying to access ultravisor resources,
for instance, trying to set the partition table entry 0. Secure guests
also need to be able to do ucalls and its kernel may not have
CONFIG_PPC_POWERNV=y. For that reason, the ucall.S file is placed under
arch/powerpc/kernel.
If ultravisor is not enabled, the ucalls will be redirected to the
hypervisor which must handle/fail the call.
Thanks to inputs from Ram Pai and Michael Anderson.
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190822034838.27876-3-cclaudio@linux.ibm.com
All the way back to introducing dma_common_mmap we've defaulted to mark
the pages as uncached. But this is wrong for DMA coherent devices.
Later on DMA_ATTR_WRITE_COMBINE also got incorrect treatment as that
flag is only treated special on the alloc side for non-coherent devices.
Introduce a new dma_pgprot helper that deals with the check for coherent
devices so that only the remapping cases ever reach arch_dma_mmap_pgprot
and we thus ensure no aliasing of page attributes happens, which makes
the powerpc version of arch_dma_mmap_pgprot obsolete and simplifies the
remaining ones.
Note that this means arch_dma_mmap_pgprot is a bit misnamed now, but
we'll phase it out soon.
Fixes: 64ccc9c033 ("common: dma-mapping: add support for generic dma_mmap_* calls")
Reported-by: Shawn Anastasio <shawn@anastas.io>
Reported-by: Gavin Li <git@thegavinli.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> # arm64
The refactor of powerpc DMA functions in commit 6666cc17d7
("powerpc/dma: remove dma_nommu_mmap_coherent") incorrectly
changes the way DMA mappings are handled on powerpc.
Since this change, all mapped pages are marked as cache-inhibited
through the default implementation of arch_dma_mmap_pgprot.
This differs from the previous behavior of only marking pages
in noncoherent mappings as cache-inhibited and has resulted in
sporadic system crashes in certain hardware configurations and
workloads (see Bugzilla).
This commit restores the previous correct behavior by providing
an implementation of arch_dma_mmap_pgprot that only marks
pages in noncoherent mappings as cache-inhibited. As this behavior
should be universal for all powerpc platforms a new file,
dma-generic.c, was created to store it.
Fixes: 6666cc17d7 ("powerpc/dma: remove dma_nommu_mmap_coherent")
# NOTE: fixes commit 6666cc17d7 released in v5.1.
# Consider a stable tag:
# Cc: stable@vger.kernel.org # v5.1+
# NOTE: fixes commit 6666cc17d7 released in v5.1.
# Consider a stable tag:
# Cc: stable@vger.kernel.org # v5.1+
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: Shawn Anastasio <shawn@anastas.io>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190717235437.12908-1-shawn@anastas.io
If you compile with KVM but without CONFIG_HAVE_HW_BREAKPOINT you fail
at linking with:
arch/powerpc/kvm/book3s_hv_rmhandlers.o:(.text+0x708): undefined reference to `dawr_force_enable'
This was caused by commit c1fe190c06 ("powerpc: Add force enable of
DAWR on P9 option").
This moves a bunch of code around to fix this. It moves a lot of the
DAWR code in a new file and creates a new CONFIG_PPC_DAWR to enable
compiling it.
Fixes: c1fe190c06 ("powerpc: Add force enable of DAWR on P9 option")
Signed-off-by: Michael Neuling <mikey@neuling.org>
[mpe: Minor formatting in set_dawr()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
All files containing functions run before kasan_early_init() is called
must have KASAN instrumentation disabled.
For those file, branch profiling also have to be disabled otherwise
each if () generates a call to ftrace_likely_update().
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
In preparation of KASAN, move early_init() into a separate
file in order to allow deactivation of KASAN for that function.
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>