Commit Graph

7295 Commits

Author SHA1 Message Date
Geert Uytterhoeven
60083063b7 powerpc/time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
The PowerPC time code is not a clock provider, and just needs to call
of_clk_init().

Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.

Remove the #ifdef protecting the of_clk_init() call, as a stub is
available for the !CONFIG_COMMON_CLK case.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200213083804.24315-1-geert+renesas@glider.be
2020-04-03 16:17:22 +11:00
Linus Torvalds
bef7b2a7be Devicetree updates for v5.7:
- Unit test for overlays with GPIO hogs
 
 - Improve dma-ranges parsing to handle dma-ranges with multiple entries
 
 - Update dtc to upstream version v1.6.0-2-g87a656ae5ff9
 
 - Improve overlay error reporting
 
 - Device link support for power-domains and hwlocks bindings
 
 - Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
   PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
   ITE Tech
 
 - Add deprecated Marvell vendor prefix 'mrvl'
 
 - A bunch of binding conversions to DT schema continues. Of note, the
   common serial and USB connector bindings are converted.
 
 - Add more Arm CPU compatibles
 
 - Drop Mark Rutland as DT maintainer :(
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAl6GTJUQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw8vaD/9HDT0EEDRdfr8s5OiLHYQMSXlSExJgR+ON
 iLI4v19apWwyycil176RXuWL5fFxh9HHOKbxOL27mmfqSLdv/xe/ev6Z4oNLVZzm
 YiUalm37K+R9yWsWTqxJKtQxWqSBBtOlyJmUYnvxI1bjfF4PANKF+IMpQVDflj1w
 9UMixoPRFdlOoMXVmC7yT0sfE1B6REJIh0Qfa4pqDwnHRhs8imgoIYA+JwFnmFPr
 k5Q8xS2YCwopH/7W5aRLzyEI2dF6Rfy9FID8jfyNoKZLtBW6cIEkA4zTv41BZSuj
 4Zj2etELXAnkIWMZXXdeKSz8IoSOlgjsCR482QmzTRuoFdmVLoJgsgh9hO2+oK07
 vJDEVJnXKxq1pF6DVbYwuzNgPB1pRzg6Y8EitdH3SBgjsOQ/m4RpKaqN0MkQhYrB
 BI9PCiF7GKCzNfNl3b/Op+HSsZ6aJkOsQPzC89/Ww0mB8azypgAwvLsjte5R3hib
 fS8z+V+lxAAC0/M981TsPKJ96HNv4VXbLI3bY9qJBd2LB+NDortsF/OkUuObsh+6
 hVQCWnae5V5iq8WmMTFABfBAK5lHafop87edn6X33LOyZZQCT9svNz/zrpakXh0e
 /SBGTUudtjCOy/BLj1VQrb6tczEzos7Ij+bszeIBAcSzrWg8gtVPjw9TvuAo2b/b
 9nwR4yrm0Q==
 =db/l
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:

 - Unit test for overlays with GPIO hogs

 - Improve dma-ranges parsing to handle dma-ranges with multiple entries

 - Update dtc to upstream version v1.6.0-2-g87a656ae5ff9

 - Improve overlay error reporting

 - Device link support for power-domains and hwlocks bindings

 - Add vendor prefixes for Beacon, Topwise, ENE, Dell, SG Micro, Elida,
   PocketBook, Xiaomi, Linutronix, OzzMaker, Waveshare Electronics, and
   ITE Tech

 - Add deprecated Marvell vendor prefix 'mrvl'

 - A bunch of binding conversions to DT schema continues. Of note, the
   common serial and USB connector bindings are converted.

 - Add more Arm CPU compatibles

 - Drop Mark Rutland as DT maintainer :(

* tag 'devicetree-for-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (106 commits)
  MAINTAINERS: drop an old reference to stm32 pwm timers doc
  MAINTAINERS: dt: update etnaviv file reference
  dt-bindings: usb: dwc2: fix bindings for amlogic, meson-gxbb-usb
  dt-bindings: uniphier-system-bus: fix warning in the example
  dt-bindings: display: meson-vpu: fix indentation of reg-names' "items"
  dt-bindings: iio: Fix adi, ltc2983 uint64-matrix schema constraints
  dt-bindings: power: Fix example for power-domain
  dt-bindings: arm: Add some constraints for PSCI nodes
  of: some unittest overlays not untracked
  of: gpio unittest kfree() wrong object
  dt-bindings: phy: convert phy-rockchip-inno-usb2 bindings to yaml
  dt-bindings: serial: sh-sci: Convert to json-schema
  dt-bindings: serial: Document serialN aliases
  dt-bindings: thermal: tsens: Set 'additionalProperties: false'
  dt-bindings: thermal: tsens: Fix nvmem-cell-names schema
  dt-bindings: vendor-prefixes: Add Beacon vendor prefix
  dt-bindings: vendor-prefixes: Add Topwise
  of: of_private.h: Replace zero-length array with flexible-array member
  docs: dt: fix a broken reference to input.yaml
  docs: dt: fix references to ap806-system-controller.txt
  ...
2020-04-02 17:32:52 -07:00
Michal Suchanek
0a7601b6ff powerpc/64: make buildable without CONFIG_COMPAT
There are numerous references to 32bit functions in generic and 64bit
code so ifdef them out.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e5619617020ef3a1f54f0c076e7d74cb9ec9f3bf.1584699455.git.msuchanek@suse.de
2020-04-03 00:10:00 +11:00
Michal Suchanek
3dd4eb83a9 powerpc: move common register copy functions from signal_32.c to signal.c
These functions are required for 64bit as well.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9fd6d9b7c5e91fab21159fe23534a2f16b4962d3.1584699455.git.msuchanek@suse.de
2020-04-03 00:09:59 +11:00
Nicholas Piggin
0c89649a70 powerpc/64s: Fix doorbell wakeup msgclr optimisation
Commit 3282a3da25 ("powerpc/64: Implement soft interrupt replay in C")
broke the doorbell wakeup optimisation introduced by commit a9af97aa0a
("powerpc/64s: msgclr when handling doorbell exceptions from system
reset").

This patch restores the msgclr, in C code. It's now done in the system
reset wakeup path rather than doorbell interrupt replay where it used
to be, because it is always the right thing to do in the wakeup case,
but it may be rarely of use in other interrupt replay situations in
which case it's wasted work - we would have to run measurements to see
if that was a worthwhile optimisation, and I suspect it would not be.

The results are similar to those in the original commit, test on POWER8
of context_switch selftests benchmark with polling idle disabled (e.g.,
always nap, giving cross-CPU IPIs) gives the following results:

                                  broken           patched
  Different threads, same core:   317k/s           375k/s    +18.7%
  Different cores:                280k/s           282k/s     +1.0%

Fixes: 3282a3da25 ("powerpc/64: Implement soft interrupt replay in C")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200402121212.1118218-1-npiggin@gmail.com
2020-04-03 00:09:53 +11:00
Leonardo Bras
41b8426fdb powerpc/cputable: Remove unnecessary copy of cpu_spec->oprofile_type
Before checking for cpu_type == NULL, this same copy happens, so doing
it here will just write the same value to the t->oprofile_type
again.

Remove the repeated copy, as it is unnecessary.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
Reviewed-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200215053637.280880-1-leonardo@linux.ibm.com
2020-04-01 14:30:51 +11:00
Mike Rapoport
b77afad84e powerpc/32: drop unused ISA_DMA_THRESHOLD
The ISA_DMA_THRESHOLD variable is set by several platforms but never
referenced.
Remove it.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191125092033.20014-1-rppt@kernel.org
2020-04-01 14:30:50 +11:00
Michael Ellerman
ead983604c powerpc/vmlinux.lds: Explicitly retain .gnu.hash
Relocatable kernel builds produce a warning about .gnu.hash being an
orphan section:

  ld: warning: orphan section `.gnu.hash' from `linker stubs' being placed in section `.gnu.hash'

If we try to discard it the build fails:

  ld -EL -m elf64lppc -pie --orphan-handling=warn --build-id -o
    .tmp_vmlinux1 -T ./arch/powerpc/kernel/vmlinux.lds --whole-archive
    arch/powerpc/kernel/head_64.o arch/powerpc/kernel/entry_64.o
    ...
    sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive
    --start-group lib/lib.a --end-group
  ld: could not find section .gnu.hash

So add an entry to explicitly retain it, as we do for .hash.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200227045933.22967-1-mpe@ellerman.id.au
2020-04-01 14:30:50 +11:00
Christophe Leroy
ccbed90b82 powerpc/ptrace: move ptrace_triggered() into hw_breakpoint.c
ptrace_triggered() is declared in asm/hw_breakpoint.h and
only needed when CONFIG_HW_BREAKPOINT is set, so move it
into hw_breakpoint.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8402c516023da1371953a65af7df2008758ea0c4.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy
da529d4739 powerpc/ptrace: create ppc_gethwdinfo()
Create ippc_gethwdinfo() to handle PPC_PTRACE_GETHWDBGINFO and
reduce ifdef mess

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/82fefcc1ec75b96cece792878217a5d85ecda0c2.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy
e08227d25a powerpc/ptrace: create ptrace_get_debugreg()
Create ptrace_get_debugreg() to handle PTRACE_GET_DEBUGREG and
reduce ifdef mess

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c1482c41a39cc216f4073a51070d8680f52d5054.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy
323a780ca1 powerpc/ptrace: split out ADV_DEBUG_REGS related functions.
Move ADV_DEBUG_REGS functions out of ptrace.c, into
ptrace-adv.c and ptrace-noadv.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Squash in fixup patch from Christophe]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e2bd7d275bd5933d848aad4fee3ca652a14d039b.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:49 +11:00
Christophe Leroy
6e0b79750c powerpc/ptrace: move register viewing functions out of ptrace.c
Create a dedicated ptrace-view.c file.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bfd8c3ed57c9057e4a5d3816737b5ee98c6f7e43.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy
7c1f8db019 powerpc/ptrace: split out TRANSACTIONAL_MEM related functions.
Move TRANSACTIONAL_MEM functions out of ptrace.c, into
ptrace-tm.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2d0ef3bb2610c0344bd42252c7134f429818c000.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy
60ef9dbd9d powerpc/ptrace: split out SPE related functions.
Move CONFIG_SPE functions out of ptrace.c, into
ptrace-spe.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0f17a331760310b5562fae3791cdd3cf9c64237b.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy
1b20773b00 powerpc/ptrace: split out ALTIVEC related functions.
Move CONFIG_ALTIVEC functions out of ptrace.c, into
ptrace-altivec.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/35dae891d01c817fca0fd6ab406a3a2c7bf07f60.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy
7b99ed4e8e powerpc/ptrace: split out VSX related functions.
Move CONFIG_VSX functions out of ptrace.c, into
ptrace-vsx.c and ptrace-novsx.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dc8e20c8c95b7e83add0c6dd48f9470628896c5c.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:48 +11:00
Christophe Leroy
963ae6b2ff powerpc/ptrace: drop PARAMETER_SAVE_AREA_OFFSET
PARAMETER_SAVE_AREA_OFFSET is not used, drop it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6dac2b49207647f75cbf0e6771a545e691f0fd93.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:47 +11:00
Christophe Leroy
f1763e623c powerpc/ptrace: drop unnecessary #ifdefs CONFIG_PPC64
Drop a bunch of #ifdefs CONFIG_PPC64 that are not vital.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/af38b87a7e1e3efe4f9b664eaeb029e6e7d69fdb.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:47 +11:00
Christophe Leroy
b3138536c8 powerpc/ptrace: remove unused header includes
Remove unused header includes in ptrace.c and ptrace32.c

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6276df0be87a4329c2bb46b3b0f02059ae9e70e6.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:47 +11:00
Christophe Leroy
da9a1c10e2 powerpc: Move ptrace into a subdirectory.
In order to allow splitting of ptrace depending on the different
CONFIG_ options, create a subdirectory dedicated to ptrace and move
ptrace.c and ptrace32.c into it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9ebcbe37834e9d447dd97f4381084795a673260c.1582848567.git.christophe.leroy@c-s.fr
2020-04-01 14:30:41 +11:00
Nicholas Piggin
5f0b6ac390 powerpc/64/syscall: Reconcile interrupts
This reconciles interrupts in the system call case like all other
interrupts. This allows system_call_common to be shared with the scv
system call implementation in a subsequent patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-31-npiggin@gmail.com
2020-04-01 13:42:14 +11:00
Nicholas Piggin
702f098052 powerpc/64s/exception: Remove lite interrupt return
Regular interrupt return restores NVGPRS whereas lite returns do not.
This is clumsy: most interrupts can return without restoring NVGPRS in
most of the time, but there are special cases that require it (when
registers have been modified by the kernel). So change interrupt
return to not restore NVGPRS, and have interrupt handlers restore them
explicitly in the cases that requires it.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-30-npiggin@gmail.com
2020-04-01 13:42:14 +11:00
Nicholas Piggin
6cc0c16d82 powerpc/64s: Implement interrupt exit logic in C
Implement the bulk of interrupt return logic in C. The asm return code
must handle a few cases: restoring full GPRs, and emulating stack
store.

The stack store emulation is significantly simplfied, rather than
creating a new return frame and switching to that before performing
the store, it uses the PACA to keep a scratch register around to
perform the store.

The asm return code is moved into 64e for now. The new logic has made
allowance for 64e, but I don't have a full environment that works well
to test it, and even booting in emulated qemu is not great for stress
testing. 64e shouldn't be too far off working with this, given a bit
more testing and auditing of the logic.

This is slightly faster on a POWER9 (page fault speed increases about
1.1%), probably due to reduced mtmsrd.

mpe: Includes fixes from Nick for _TIF_EMULATE_STACK_STORE
handling (including the fast_interrupt_return path), to remove
trace_hardirqs_on(), and fixes the interrupt-return part of the
MSR_VSX restore bug caught by tm-unavailable selftest.

mpe: Incorporate fix from Nick:

The return-to-kernel path has to replay any soft-pending interrupts if
it is returning to a context that had interrupts soft-enabled. It has
to do this carefully and avoid plain enabling interrupts if this is an
irq context, which can cause multiple nesting of interrupts on the
stack, and other unexpected issues.

The code which avoided this case got the soft-mask state wrong, and
marked interrupts as enabled before going around again to retry. This
seems to be mostly harmless except when PREEMPT=y, this calls
preempt_schedule_irq with irqs apparently enabled and runs into a BUG
in kernel/sched/core.c

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-29-npiggin@gmail.com
2020-04-01 13:42:14 +11:00
Nicholas Piggin
3282a3da25 powerpc/64: Implement soft interrupt replay in C
When local_irq_enable() finds a pending soft-masked interrupt, it
"replays" it by setting up registers like the initial interrupt entry,
then calls into the low level handler to set up an interrupt stack
frame and process the interrupt.

This is not necessary, and uses more stack than needed. The high level
interrupt handler can be called directly from C, with just pt_regs set
up on stack. This should be faster and use less stack.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-28-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin
993c670a4d powerpc/64/syscall: Zero volatile registers when returning
Kernel addresses and potentially other sensitive data could be leaked
in volatile registers after a syscall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-27-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin
68b34588e2 powerpc/64/sycall: Implement syscall entry/exit logic in C
System call entry and particularly exit code is beyond the limit of
what is reasonable to implement in asm.

This conversion moves all conditional branches out of the asm code,
except for the case that all GPRs should be restored at exit.

Null syscall test is about 5% faster after this patch, because the
exit work is handled under local_irq_disable, and the hard mask and
pending interrupt replay is handled after that, which avoids games
with MSR.

mpe: Includes subsequent fixes from Nick:

This fixes 4 issues caught by TM selftests. First was a tm-syscall bug
that hit due to tabort_syscall being called after interrupts were
reconciled (in a subsequent patch), which led to interrupts being
enabled before tabort_syscall was called. Rather than going through an
un-reconciling interrupts for the return, I just go back to putting
the test early in asm, the C-ification of that wasn't a big win
anyway.

Second is the syscall return _TIF_USER_WORK_MASK check would go into
an infinite loop if _TIF_RESTORE_TM became set. The asm code uses
_TIF_USER_WORK_MASK to brach to slowpath which includes
restore_tm_state.

Third is system call return was not calling restore_tm_state, I missed
this completely (alhtough it's in the return from interrupt C
conversion because when the asm syscall code encountered problems it
would branch to the interrupt return code.

Fourth is MSR_VEC missing from restore_math, which was caught by
tm-unavailable selftest taking an unexpected facility unavailable
interrupt when testing VSX unavailble exception with MSR.FP=1
MSR.VEC=1. Fourth case also has a fixup in a subsequent patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-26-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin
965dd3ad30 powerpc/64/syscall: Remove non-volatile GPR save optimisation
powerpc has an optimisation where interrupts avoid saving the
non-volatile (or callee saved) registers to the interrupt stack frame
if they are not required.

Two problems with this are that an interrupt does not always know
whether it will need non-volatiles; and if it does need them, they can
only be saved from the entry-scoped asm code (because we don't control
what the C compiler does with these registers).

system calls are the most difficult: some system calls always require
all registers (e.g., fork, to copy regs into the child). Sometimes
registers are only required under certain conditions (e.g., tracing,
signal delivery). These cases require ugly logic in the call
chains (e.g., ppc_fork), and require a lot of logic to be implemented
in asm.

So remove the optimisation for system calls, and always save NVGPRs on
entry. Modern high performance CPUs are not so sensitive, because the
stores are dense in cache and can be hidden by other expensive work in
the syscall path -- the null syscall selftests benchmark on POWER9 is
not slowed (124.40ns before and 123.64ns after, i.e., within the
noise).

Other interrupts retain the NVGPR optimisation for now.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-24-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin
71c3b05a80 powerpc/64s/exception: Soft NMI interrupt should not use ret_from_except
The soft NMI handler does not reconcile interrupt state, so it should
not return via the normal ret_from_except path. Return like other NMIs,
using the EXCEPTION_RESTORE_REGS macro.

This becomes important when the scv interrupt is implemented, which
must handle soft-masked interrupts that have r13 set to something
other than the PACA -- returning to kernel in this case must restore
r13.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-23-npiggin@gmail.com
2020-04-01 13:42:13 +11:00
Nicholas Piggin
b44fc96d7b powerpc/64s/exception: Reconcile interrupts in system_reset
This adds IRQ_HARD_DIS to irq_happened. Although it doesn't seem to
matter much because we're not allowed to enable irqs in an NMI
handler, the soft-irq debugging code is becoming more strict about
ensuring IRQ_HARD_DIS is in sync with MSR[EE], this may help avoid
asserts or other issues.

Add a comment explaining why MCE does not have this. Early machine
check is generally much smaller and more contained code which will
explode if you look at it wrong anyway as it runs in real mode, though
there's an argument that we should do similar reconciling for the MCE
as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-22-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin
2284ffea8f powerpc/64s/exception: Only test KVM in SRR interrupts when PR KVM is supported
Apart from SRESET, MCE, and syscall (hcall variant), the SRR type
interrupts are not escalated to hypervisor mode, so are delivered to
the OS.

When running PR KVM, the OS is the hypervisor, and the guest runs with
MSR[PR]=1 (ie. usermode), so these interrupts must test if a guest was
running when interrupted. These tests are required at the real-mode
entry points because the PR KVM host runs with LPCR[AIL]=0.

In HV KVM and nested HV KVM, the guest always receives these
interrupts, so there is no need for the host to make this test. So
remove the tests if PR KVM is not configured.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-21-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin
94325357e8 powerpc/64s/exception: Add more comments for interrupt handlers
A few of the non-standard handlers are left uncommented. Some more
description could be added to some.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-20-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin
3f7fbd97d0 powerpc/64s/exception: Clean up SRR specifiers
Remove more magic numbers and replace with nicely named bools.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-19-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin
689e732262 powerpc/64s/exception: Re-inline some handlers
The reduction in interrupt entry size allows some handlers to be
re-inlined.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-18-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin
2babd6ea43 powerpc/64s/exception: Avoid touching the stack in hdecrementer
The hdec interrupt handler is reported to sometimes fire in Linux if
KVM leaves it pending after a guest exists. This is harmless, so there
is a no-op handler for it.

The interrupt handler currently uses the regular kernel stack. Change
this to avoid touching the stack entirely.

This should be the last place where the regular Linux stack can be
accessed with asynchronous interrupts (including PMI) soft-masked.
It might be possible to take advantage of this invariant, e.g., to
context switch the kernel stack SLB entry without clearing MSR[EE].

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-17-npiggin@gmail.com
2020-04-01 13:42:12 +11:00
Nicholas Piggin
9d598f9344 powerpc/64s/exception: Trim unused arguments from KVMTEST macro
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-16-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin
931dc86b3a powerpc/64s/exception: Remove the SPR saving patch code macros
These are used infrequently enough they don't provide much help, so
inline them.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-15-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin
d73a10cbf9 powerpc/64s/exception: Remove confusing IEARLY option
Replace IEARLY=1 and IEARLY=2 with IBRANCH_COMMON, which controls if
the entry code branches to a common handler; and IREALMODE_COMMON,
which controls whether the common handler should remain in real mode.

These special cases no longer avoid loading the SRR registers, there
is no point as most of them load the registers immediately anyway.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-14-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin
9600f261ac powerpc/64s/exception: Move KVM test to common code
This allows more code to be moved out of unrelocated regions. The
system call KVMTEST is changed to be open-coded and remain in the
tramp area to avoid having to move it to entry_64.S. The custom nature
of the system call entry code means the hcall case can be made more
streamlined than regular interrupt handlers.

mpe: Incorporate fix from Nick:

Moving KVM test to the common entry code missed the case of HMI and
MCE, which do not do __GEN_COMMON_ENTRY (because they don't want to
switch to virt mode).

This means a MCE or HMI exception that is taken while KVM is running a
guest context will not be switched out of that context, and KVM won't
be notified. Found by running sigfuz in guest with patched host on
POWER9 DD2.3, which causes some TM related HMI interrupts (which are
expected and supposed to be handled by KVM).

This fix adds a __GEN_REALMODE_COMMON_ENTRY for those handlers to add
the KVM test. This makes them look a little more like other handlers
that all use __GEN_COMMON_ENTRY.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-13-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin
0eddf327e1 powerpc/64s/exception: Move soft-mask test to common code
As well as moving code out of the unrelocated vectors, this allows the
masked handlers to be moved to common code, and allows the soft_nmi
handler to be generated more like a regular handler.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-12-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin
8729c26e67 powerpc/64s/exception: Move real to virt switch into the common handler
The real mode interrupt entry points currently use rfid to branch to
the common handler in virtual mode. This is a significant amount of
code, and forces other code (notably the KVM test) to live in the
real mode handler.

In the interest of minimising the amount of code that runs unrelocated
move the switch to virt mode into the common code, and do it with
mtmsrd, which avoids clobbering SRRs (although the post-KVMTEST
performance of real-mode interrupt handlers is not a big concern these
days).

This requires CTR to always be saved (real-mode needs to reach 0xc...)
but that's not a huge impact these days. It could be optimized away in
future.

mpe: Incorporate fix from Nick:

It's possible for interrupts to be replayed when TM is enabled and
suspended, for example rt_sigreturn, where the mtmsrd MSR_KERNEL in
the real-mode entry point to the common handler causes a TM Bad Thing
exception (due to attempting to clear suspended).

The fix for this is to have replay interrupts go to the _virt entry
point and skip the mtmsrd, which matches what happens before this
patch.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-11-npiggin@gmail.com
2020-04-01 13:42:11 +11:00
Nicholas Piggin
a3cd35be6e powerpc/64s/exception: Add ISIDE option
Rather than using DAR=2 to select the i-side registers, add an
explicit option.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-10-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin
b177ae2f8c powerpc/64s/exception: Remove old INT_KVM_HANDLER
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-9-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin
6d71759a74 powerpc/64s/exception: Remove old INT_COMMON macro
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-8-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin
fc589ee416 powerpc/64s/exception: Remove old INT_ENTRY macro
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-7-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin
4f50541f67 powerpc/64s/exception: Move all interrupt handlers to new style code gen macros
Aside from label names and BUG line numbers, the generated code change
is an additional HMI KVM handler added for the "late" KVM handler,
because early and late HMI generation is achieved by defining two
different interrupt types.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-6-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin
eb204d863b powerpc/64s/exception: Expand EXC_COMMON and EXC_COMMON_ASYNC macros
These don't provide a large amount of code sharing. Removing them
makes code easier to shuffle around. For example, some of the common
instructions will be moved into the common code gen macro.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-5-npiggin@gmail.com
2020-04-01 13:42:10 +11:00
Nicholas Piggin
d52fd3d31b powerpc/64s/exception: Add GEN_KVM macro that uses INT_DEFINE parameters
No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-4-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Nicholas Piggin
7cb3a1a03e powerpc/64s/exception: Add GEN_COMMON macro that uses INT_DEFINE parameters
No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-3-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Nicholas Piggin
a42a239db3 powerpc/64s/exception: Introduce INT_DEFINE parameter block for code generation
The code generation macro arguments are difficult to read, and
defaults can't easily be used.

This introduces a block where parameters can be set for interrupt
handler code generation by the subsequent macros, and adds the first
generation macro for interrupt entry.

One interrupt handler is converted to the new macros to demonstrate
the change, the rest will be coverted all at once.

No generated code change.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225173541.1549955-2-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Nicholas Piggin
a2e366832f powerpc/64: mark emergency stacks valid to unwind
Before:

  WARNING: CPU: 0 PID: 494 at arch/powerpc/kernel/irq.c:343
  CPU: 0 PID: 494 Comm: a Tainted: G        W
  NIP:  c00000000001ed2c LR: c000000000d13190 CTR: c00000000003f910
  REGS: c0000001fffd3870 TRAP: 0700   Tainted: G        W
  MSR:  8000000000021003 <SF,ME,RI,LE>  CR: 28000488  XER: 00000000
  CFAR: c00000000001ec90 IRQMASK: 0
  GPR00: c000000000aeb12c c0000001fffd3b00 c0000000012ba300 0000000000000000
  GPR04: 0000000000000000 0000000000000000 000000010bd207c8 6b00696e74657272
  GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000
  GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 000000010bd207bc
  GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50
  NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
  LR [c000000000d13190] _raw_spin_unlock_irqrestore+0x50/0xc0
  Call Trace:
  Instruction dump:
  60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
  60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000

After:

  WARNING: CPU: 0 PID: 499 at arch/powerpc/kernel/irq.c:343
  CPU: 0 PID: 499 Comm: a Not tainted
  NIP:  c00000000001ed2c LR: c000000000d13210 CTR: c00000000003f980
  REGS: c0000001fffd3870 TRAP: 0700   Not tainted
  MSR:  8000000000021003 <SF,ME,RI,LE>  CR: 28000488  XER: 00000000
  CFAR: c00000000001ec90 IRQMASK: 0
  GPR00: c000000000aeb1ac c0000001fffd3b00 c0000000012ba300 0000000000000000
  GPR04: 0000000000000000 0000000000000000 00000001347607c8 6b00696e74657272
  GPR08: 0000000000000000 0000000000000000 0000000000000000 efbeadde00000000
  GPR12: 0000000000000000 c0000000014a0000 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR24: 0000000000000000 0000000000000000 0000000000000000 00000001347607bc
  GPR28: 0000000000000000 c00000000148a898 0000000000000000 c0000001ffff3f50
  NIP [c00000000001ed2c] arch_local_irq_restore.part.0+0xac/0x100
  LR [c000000000d13210] _raw_spin_unlock_irqrestore+0x50/0xc0
  Call Trace:
  [c0000001fffd3b20] [c000000000aeb1ac] of_find_property+0x6c/0x90
  [c0000001fffd3b70] [c000000000aeb1f0] of_get_property+0x20/0x40
  [c0000001fffd3b90] [c000000000042cdc] rtas_token+0x3c/0x70
  [c0000001fffd3bb0] [c0000000000dc318] fwnmi_release_errinfo+0x28/0x70
  [c0000001fffd3c10] [c0000000000dcd8c] pseries_machine_check_realmode+0x1dc/0x540
  [c0000001fffd3cd0] [c00000000003fe04] machine_check_early+0x54/0x70
  [c0000001fffd3d00] [c000000000008384] machine_check_early_common+0x134/0x1f0
  --- interrupt: 200 at 0x1347607c8
      LR = 0x7fffafbd8328
  Instruction dump:
  60000000 7d2000a6 71298000 41820068 39200002 7d210164 4bffff9c 60000000
  60000000 7d2000a6 71298000 4c820020 <0fe00000> 4e800020 60000000 60000000

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200325104144.158362-1-npiggin@gmail.com
2020-04-01 13:42:09 +11:00
Michael Ellerman
c7def7fbde powerpc/64/tm: Don't let userspace set regs->trap via sigreturn
In restore_tm_sigcontexts() we take the trap value directly from the
user sigcontext with no checking:

	err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]);

This means we can be in the kernel with an arbitrary regs->trap value.

Although that's not immediately problematic, there is a risk we could
trigger one of the uses of CHECK_FULL_REGS():

	#define CHECK_FULL_REGS(regs)	BUG_ON(regs->trap & 1)

It can also cause us to unnecessarily save non-volatile GPRs again in
save_nvgprs(), which shouldn't be problematic but is still wrong.

It's also possible it could trick the syscall restart machinery, which
relies on regs->trap not being == 0xc00 (see 9a81c16b52 ("powerpc:
fix double syscall restarts")), though I haven't been able to make
that happen.

Finally it doesn't match the behaviour of the non-TM case, in
restore_sigcontext() which zeroes regs->trap.

So change restore_tm_sigcontexts() to zero regs->trap.

This was discovered while testing Nick's upcoming rewrite of the
syscall entry path. In that series the call to save_nvgprs() prior to
signal handling (do_notify_resume()) is removed, which leaves the
low-bit of regs->trap uncleared which can then trigger the FULL_REGS()
WARNs in setup_tm_sigcontexts().

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200401023836.3286664-1-mpe@ellerman.id.au
2020-04-01 13:42:00 +11:00
Ganesh Goudar
efbc4303b2 powerpc/pseries: Handle UE event for memcpy_mcsafe
memcpy_mcsafe has been implemented for power machines which is used
by pmem infrastructure, so that an UE encountered during memcpy from
pmem devices would not result in panic instead a right error code
is returned. The implementation expects machine check handler to ignore
the event and set nip to continue the execution from fixup code.

Appropriate changes are already made to powernv machine check handler,
make similar changes to pseries machine check handler to ignore the
the event and set nip to continue execution at the fixup entry if we
hit UE at an instruction with a fixup entry.

while we are at it, have a common function which searches the exception
table entry and updates nip with fixup address, and any future common
changes can be made in this function that are valid for both architectures.

powernv changes are made by
commit 895e3dceeb ("powerpc/mce: Handle UE event for memcpy_mcsafe")

Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Reviewed-by: Santosh S <santosh@fossix.org>
Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200326184916.31172-1-ganeshgr@linux.ibm.com
2020-03-27 14:59:35 +11:00
Michael Ellerman
c72e8da062 powerpc/smp: Use IS_ENABLED() to avoid #ifdef
We can avoid the #ifdef by using IS_ENABLED() in the existing
condition check.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313112020.28235-2-mpe@ellerman.id.au
2020-03-27 01:15:09 +11:00
Michael Ellerman
4b4d181d63 powerpc/smp: Drop superfluous NULL check
We don't need the NULL check of np, the result is the same because the
OF helpers cope with NULL, of_node_to_nid(NULL) == NUMA_NO_NODE (-1).

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200313112020.28235-1-mpe@ellerman.id.au
2020-03-27 01:15:09 +11:00
Nick Desaulniers
a7032637b5 powerpc: Prefer __section and __printf from compiler_attributes.h
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
[mpe: Drop changes to a/p/boot which doesn't use linux headers]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190812215052.71840-10-ndesaulniers@google.com
2020-03-27 00:16:32 +11:00
Masahiro Yamada
d198b34f38 .gitignore: add SPDX License Identifier
Add SPDX License Identifier to all .gitignore files.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-03-25 11:50:48 +01:00
Fabiano Rosas
7074695ac6 powerpc/prom_init: Remove leftover comment
The if statement that this comment referred to was removed in
commit 11fdb30934 ("powerpc/prom_init: Remove support for OPAL v2").

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200324182912.1048906-1-farosas@linux.ibm.com
2020-03-25 21:15:01 +11:00
Christophe Leroy
21f8b2fa3c powerpc/kprobes: Ignore traps that happened in real mode
When a program check exception happens while MMU translation is
disabled, following Oops happens in kprobe_handler() in the following
code:

	} else if (*addr != BREAKPOINT_INSTRUCTION) {

  BUG: Unable to handle kernel data access on read at 0x0000e268
  Faulting instruction address: 0xc000ec34
  Oops: Kernel access of bad area, sig: 11 [#1]
  BE PAGE_SIZE=16K PREEMPT CMPC885
  Modules linked in:
  CPU: 0 PID: 429 Comm: cat Not tainted 5.6.0-rc1-s3k-dev-00824-g84195dc6c58a #3267
  NIP:  c000ec34 LR: c000ecd8 CTR: c019cab8
  REGS: ca4d3b58 TRAP: 0300   Not tainted  (5.6.0-rc1-s3k-dev-00824-g84195dc6c58a)
  MSR:  00001032 <ME,IR,DR,RI>  CR: 2a4d3c52  XER: 00000000
  DAR: 0000e268 DSISR: c0000000
  GPR00: c000b09c ca4d3c10 c66d0620 00000000 ca4d3c60 00000000 00009032 00000000
  GPR08: 00020000 00000000 c087de44 c000afe0 c66d0ad0 100d3dd6 fffffff3 00000000
  GPR16: 00000000 00000041 00000000 ca4d3d70 00000000 00000000 0000416d 00000000
  GPR24: 00000004 c53b6128 00000000 0000e268 00000000 c07c0000 c07bb6fc ca4d3c60
  NIP [c000ec34] kprobe_handler+0x128/0x290
  LR [c000ecd8] kprobe_handler+0x1cc/0x290
  Call Trace:
  [ca4d3c30] [c000b09c] program_check_exception+0xbc/0x6fc
  [ca4d3c50] [c000e43c] ret_from_except_full+0x0/0x4
  --- interrupt: 700 at 0xe268
  Instruction dump:
  913e0008 81220000 38600001 3929ffff 91220000 80010024 bb410008 7c0803a6
  38210020 4e800020 38600000 4e800020 <813b0000> 6d2a7fe0 2f8a0008 419e0154
  ---[ end trace 5b9152d4cdadd06d ]---

kprobe is not prepared to handle events in real mode and functions
running in real mode should have been blacklisted, so kprobe_handler()
can safely bail out telling 'this trap is not mine' for any trap that
happened while in real-mode.

If the trap happened with MSR_IR or MSR_DR cleared, return 0
immediately.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Fixes: 6cc89bad60 ("powerpc/kprobes: Invoke handlers directly")
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/424331e2006e7291a1bfe40e7f3fa58825f565e1.1582054578.git.christophe.leroy@c-s.fr
2020-03-25 12:09:51 +11:00
Oliver O'Halloran
e86350f70a powerpc/eeh: Rework eeh_ops->probe()
With the EEH early probe now being pseries specific there's no need for
eeh_ops->probe() to take a pci_dn. Instead, we can make it take a pci_dev
and use the probe function to map a pci_dev to an eeh_dev. This allows
the platform to implement it's own method for finding (or creating) an
eeh_dev for a given pci_dev which also removes a use of pci_dn in
generic EEH code.

This patch also renames eeh_device_add_late() to eeh_device_probe(). This
better reflects what it does does and removes the last vestiges of the
early/late EEH probe split.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-6-oohall@gmail.com
2020-03-25 12:09:39 +11:00
Oliver O'Halloran
b6eebb093c powerpc/eeh: Make early EEH init pseries specific
The eeh_ops->probe() function is called from two different contexts:

1. On pseries, where we set EEH_PROBE_MODE_DEVTREE, it's called in
   eeh_add_device_early() which is supposed to run before we create
   a pci_dev.

2. On PowerNV, where we set EEH_PROBE_MODE_DEV, it's called in
   eeh_device_add_late() which is supposed to run *after* the
   pci_dev is created.

The "early" probe is required because PAPR requires that we perform an RTAS
call to enable EEH support on a device before we start interacting with it
via config space or MMIO. This requirement doesn't exist on PowerNV and
shoehorning two completely separate initialisation paths into a common
interface just results in a convoluted code everywhere.

Additionally the early probe requires the probe function to take an pci_dn
rather than a pci_dev argument. We'd like to make pci_dn a pseries specific
data structure since there's no real requirement for them on PowerNV. To
help both goals move the early probe into the pseries containment zone
so the platform depedence is more explicit.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-5-oohall@gmail.com
2020-03-25 12:09:39 +11:00
Oliver O'Halloran
3ff32efb62 powerpc/eeh: Remove PHB check in probe
This check for a missing PHB has existing in various forms since the
initial PPC64 port was upstreamed in 2002. The idea seems to be that we
need to guard against creating pci-specific data structures for the non-pci
children of a PCI device tree node (e.g. USB devices). However, we only
create pci_dn structures for DT nodes that correspond to PCI devices so
there's not much point in doing this check in the eeh_probe path.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-4-oohall@gmail.com
2020-03-25 12:09:39 +11:00
Oliver O'Halloran
a4b4f61db8 powerpc/eeh: Do early EEH init only when required
The pci hotplug helper (pci_hp_add_devices()) calls
eeh_add_device_tree_early() to scan the device-tree for new PCI devices and
do the early EEH probe before the device is scanned. This early probe is a
no-op in a lot of cases because:

a) The early init is only required to satisfy a PAPR requirement that EEH
   be configured before we start doing config accesses. On PowerNV it is
   a no-op.

b) It's a no-op for devices that have already had their eeh_dev
   initialised.

There are four callers of pci_hp_add_devices():

1. arch/powerpc/kernel/eeh_driver.c
	Here the hotplug helper is called when re-scanning pci_devs that
	were removed during an EEH recovery pass. The EEH stat for each
	removed device (the eeh_dev) is retained across a recovery pass
	so the early init is a no-op in this case.

2. drivers/pci/hotplug/pnv_php.c
	This is also a no-op since the PowerNV hotplug driver is, suprisingly,
	PowerNV specific.

3. drivers/pci/hotplug/rpaphp_core.c
4. drivers/pci/hotplug/rpaphp_pci.c
	In these two cases new devices have been hotplugged and FW has
	provided new DT nodes for each. These are the only two cases where
	the EEH we might have new PCI device nodes in the DT so these are
	the only two cases where the early EEH probe needs to be done.

We can move the calls to eeh_add_device_tree_early() to the locations where
it's needed and remove it from the generic path. This is preparation for
making the early EEH probe pseries specific.

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-3-oohall@gmail.com
2020-03-25 12:09:38 +11:00
Oliver O'Halloran
2d0953f7d5 powerpc/eeh: Remove eeh_add_device_tree_late()
On pseries and PowerNV pcibios_bus_add_device() calls eeh_add_device_late()
so there's no need to do a separate tree traversal to bind the eeh_dev and
pci_dev together setting up the PHB at boot. As a result we can remove
eeh_add_device_tree_late().

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-2-oohall@gmail.com
2020-03-25 12:09:38 +11:00
Oliver O'Halloran
8645aaa879 powerpc/eeh: Add sysfs files in late probe
Move creating the EEH specific sysfs files into eeh_add_device_late()
rather than being open-coded all over the place. Calling the function is
generally done immediately after calling eeh_add_device_late() anyway. This
is also a correctness fix since currently the sysfs files will be added
even if the EEH probe happens to fail.

Similarly, on pseries we currently add the sysfs files before calling
eeh_add_device_late(). This is flat-out broken since the sysfs files
require the pci_dev->dev.archdata.edev pointer to be set, and that is done
in eeh_add_device_late().

Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200306073904.4737-1-oohall@gmail.com
2020-03-25 12:09:38 +11:00
Michael Ellerman
7053f80d96 powerpc/64: Prevent stack protection in early boot
The previous commit reduced the amount of code that is run before we
setup a paca. However there are still a few remaining functions that
run with no paca, or worse, with an arbitrary value in r13 that will
be used as a paca pointer.

In particular the stack protector canary is stored in the paca, so if
stack protector is activated for any of these functions we will read
the stack canary from wherever r13 points. If r13 happens to point
outside of memory we will get a machine check / checkstop.

For example if we modify initialise_paca() to trigger stack
protection, and then boot in the mambo simulator with r13 poisoned in
skiboot before calling the kernel:

  DEBUG: 19952232: (19952232): INSTRUCTION: PC=0xC0000000191FC1E8: [0x3C4C006D]: addis   r2,r12,0x6D [fetch]
  DEBUG: 19952236: (19952236): INSTRUCTION: PC=0xC00000001807EAD8: [0x7D8802A6]: mflr    r12 [fetch]
  FATAL ERROR: 19952276: (19952276): Check Stop for 0:0: Machine Check with ME bit of MSR off
  DEBUG: 19952276: (19952276): INSTRUCTION: PC=0xC0000000191FCA7C: [0xE90D0CF8]: ld      r8,0xCF8(r13) [Instruction Failed]
  INFO: 19952276: (19952277): ** Execution stopped: Mambo Error, Machine Check Stop,  **
  systemsim % bt
  pc:                             0xC0000000191FCA7C      initialise_paca+0x54
  lr:                             0xC0000000191FC22C      early_setup+0x44
  stack:0x00000000198CBED0        0x0     +0x0
  stack:0x00000000198CBF00        0xC0000000191FC22C      early_setup+0x44
  stack:0x00000000198CBF90        0x1801C968      +0x1801C968

So annotate the relevant functions to ensure stack protection is never
enabled for them.

Fixes: 06ec27aea9 ("powerpc/64: add stack protector support")
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200320032116.1024773-2-mpe@ellerman.id.au
2020-03-25 12:09:38 +11:00
Daniel Axtens
d4a8e98621 powerpc/64: Setup a paca before parsing device tree etc.
Currently we set up the paca after parsing the device tree for CPU
features. Prior to that, r13 contains random data, which means there
is random data in r13 while we're running the generic dt parsing code.

This random data varies depending on whether we boot through a vmlinux
or a zImage: for the vmlinux case it's usually around zero, but for
zImages we see random values like 912a72603d420015.

This is poor practice, and can also lead to difficult-to-debug
crashes. For example, when kcov is enabled, the kcov instrumentation
attempts to read preempt_count out of the current task, which goes via
the paca. This then crashes in the zImage case.

Similarly stack protector can cause crashes if r13 is bogus, by
reading from the stack canary in the paca.

To resolve this:

 - move the paca setup to before the CPU feature parsing.

 - because we no longer have access to CPU feature flags in paca
 setup, change the HV feature test in the paca setup path to consider
 the actual value of the MSR rather than the CPU feature.

Translations get switched on once we leave early_setup, so I think
we'd already catch any other cases where the paca or task aren't set
up.

Boot tested on a P9 guest and host.

Fixes: fb0b0a73b2 ("powerpc: Enable kcov")
Fixes: 06ec27aea9 ("powerpc/64: add stack protector support")
Cc: stable@vger.kernel.org # v4.20+
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Daniel Axtens <dja@axtens.net>
[mpe: Reword comments & change log a bit to mention stack protector]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200320032116.1024773-1-mpe@ellerman.id.au
2020-03-25 12:09:37 +11:00
Christophe Leroy
697ece78f8 powerpc/32s: reorder Linux PTE bits to better match Hash PTE bits.
Reorder Linux PTE bits to (almost) match Hash PTE bits.

RW Kernel : PP = 00
RO Kernel : PP = 00
RW User   : PP = 01
RO User   : PP = 11

So naturally, we should have
_PAGE_USER = 0x001
_PAGE_RW   = 0x002

Today 0x001 and 0x002 and _PAGE_PRESENT and _PAGE_HASHPTE which
both are software only bits.

Switch _PAGE_USER and _PAGE_PRESET
Switch _PAGE_RW and _PAGE_HASHPTE

This allows to remove a few insns.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c4d6c18a7f8d9d3b899bc492f55fbc40ef38896a.1583861325.git.christophe.leroy@c-s.fr
2020-03-25 12:09:27 +11:00
Christophe Leroy
eb4f8e259a powerpc/kprobes: Remove redundant code
At the time being we have something like

	if (something) {
		p = get();
		if (p) {
			if (something_wrong)
				goto out;
			...
			return;
		} else if (a != b) {
			if (some_error)
				goto out;
			...
		}
		goto out;
	}
	p = get();
	if (!p) {
		if (a != b) {
			if (some_error)
				goto out;
			...
		}
		goto out;
	}

This is similar to

	p = get();
	if (!p) {
		if (a != b) {
			if (some_error)
				goto out;
			...
		}
		goto out;
	}
	if (something) {
		if (something_wrong)
			goto out;
		...
		return;
	}

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Reflow the comment that was moved]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/07a17425743600460ce35fa9432d42487a825583.1582099499.git.christophe.leroy@c-s.fr
2020-03-25 12:09:08 +11:00
Michael Ellerman
6eeb9b3b9c powerpc/64s: Fix section mismatch warnings from boot code
We currently have two section mismatch warnings:

  The function __boot_from_prom() references
  the function __init prom_init().

  The function start_here_common() references
  the function __init start_kernel().

The warnings are correct, we do have branches from non-init code into
init code, which is freed after boot. But we don't expect to ever
execute any of that early boot code after boot, if we did that would
be a bug. In particular calling into OF after boot would be fatal
because OF is no longer resident.

So for now fix the warnings by marking the relevant functions as
__REF, which puts them in the ".ref.text" section.

This causes some reordering of the functions in the final link:

  @@ -217,10 +217,9 @@
   c00000000000b088 t generic_secondary_common_init
   c00000000000b124 t __mmu_off
   c00000000000b14c t __start_initialization_multiplatform
  -c00000000000b1ac t __boot_from_prom
  -c00000000000b1ec t __after_prom_start
  -c00000000000b260 t p_end
  -c00000000000b27c T copy_and_flush
  +c00000000000b1ac t __after_prom_start
  +c00000000000b220 t p_end
  +c00000000000b23c T copy_and_flush
   c00000000000b300 T __secondary_start
   c00000000000b300 t copy_to_here
   c00000000000b344 t start_secondary_prolog
  @@ -228,8 +227,9 @@
   c00000000000b36c t enable_64b_mode
   c00000000000b388 T relative_toc
   c00000000000b3a8 t p_toc
  -c00000000000b3b0 t start_here_common
  -c00000000000b3d0 t start_here_multiplatform
  +c00000000000b3b0 t __boot_from_prom
  +c00000000000b3f0 t start_here_multiplatform
  +c00000000000b480 t start_here_common
   c00000000000b880 T system_call_common
   c00000000000b974 t system_call
   c00000000000b9dc t system_call_exit

In particular __boot_from_prom moves after copy_to_here, which means
it's not copied to zero in the first stage of copy of the kernel to
zero.

But that's OK, because we only call __boot_from_prom before we do the
copy, so it makes no difference when it's copied. The call sequence
is:
  __start
  -> __start_initialization_multiplatform
     -> __boot_from_prom
        -> __start
           -> __start_initialization_multiplatform
              -> __after_prom_start
                 -> copy_and_flush
                 -> copy_and_flush (relocated to 0)
                    -> start_here_multiplatform
                       -> early_setup

Reported-by: Mauricio Faria de Oliveira <mauricfo@linux.ibm.com>
Reported-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200225031328.14676-1-mpe@ellerman.id.au
2020-03-25 12:07:59 +11:00
Alexey Kardashevskiy
74bb84e511 powerpc/prom_init: Pass the "os-term" message to hypervisor
The "os-term" RTAS calls has one argument with a message address of OS
termination cause. rtas_os_term() already passes it but the recently
added prom_init's version of that missed it; it also does not fill
args correctly.

This passes the message address and initializes the number of arguments.

Fixes: 6a9c930bd7 ("powerpc/prom_init: Add the ESM call to prom_init")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200312074404.87293-1-aik@ozlabs.ru
2020-03-25 12:07:58 +11:00
Fangrui Song
90ceddcb49 bpf: Support llvm-objcopy for vmlinux BTF
Simplify gen_btf logic to make it work with llvm-objcopy. The existing
'file format' and 'architecture' parsing logic is brittle and does not
work with llvm-objcopy/llvm-objdump.

'file format' output of llvm-objdump>=11 will match GNU objdump, but
'architecture' (bfdarch) may not.

.BTF in .tmp_vmlinux.btf is non-SHF_ALLOC. Add the SHF_ALLOC flag
because it is part of vmlinux image used for introspection. C code
can reference the section via linker script defined __start_BTF and
__stop_BTF. This fixes a small problem that previous .BTF had the
SHF_WRITE flag (objcopy -I binary -O elf* synthesized .data).

Additionally, `objcopy -I binary` synthesized symbols
_binary__btf_vmlinux_bin_start and _binary__btf_vmlinux_bin_stop (not
used elsewhere) are replaced with more commonplace __start_BTF and
__stop_BTF.

Add 2>/dev/null because GNU objcopy (but not llvm-objcopy) warns
"empty loadable segment detected at vaddr=0xffffffff81000000, is this intentional?"

We use a dd command to change the e_type field in the ELF header from
ET_EXEC to ET_REL so that lld will accept .btf.vmlinux.bin.o.  Accepting
ET_EXEC as an input file is an extremely rare GNU ld feature that lld
does not intend to support, because this is error-prone.

The output section description .BTF in include/asm-generic/vmlinux.lds.h
avoids potential subtle orphan section placement issues and suppresses
--orphan-handling=warn warnings.

Fixes: df786c9b94 ("bpf: Force .BTF section start to zero when dumping from vmlinux")
Fixes: cb0cc635c7 ("powerpc: Include .BTF section")
Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Stanislav Fomichev <sdf@google.com>
Tested-by: Andrii Nakryiko <andriin@fb.com>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://github.com/ClangBuiltLinux/linux/issues/871
Link: https://lore.kernel.org/bpf/20200318222746.173648-1-maskray@google.com
2020-03-19 12:32:38 +01:00
Joe Lawrence
ffd3eaf178 powerpc/vdso: remove deprecated VDS64_HAS_DESCRIPTORS references
The original 2005 patch that introduced the powerpc vdso, pre-git
("ppc64: Implement a vDSO and use it for signal trampoline") notes that:

  ... symbols exposed by the vDSO aren't "normal" function symbols, apps
  can't be expected to link against them directly, the vDSO's are both
  seen as if they were linked at 0 and the symbols just contain offsets
  to the various functions.  This is done on purpose to avoid a
  relocation step (ppc64 functions normally have descriptors with abs
  addresses in them).  When glibc uses those functions, it's expected to
  use it's own trampolines that know how to reach them.

Despite that explanation, there remains dead #ifdef
VDS64_HAS_DESCRIPTORS code-blocks that provide alternate function
definitions that setup function descriptors.

Since VDS64_HAS_DESCRIPTORS has been unused for all these years, we
might as well finally remove it from the codebase.

Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200224211848.26087-1-joe.lawrence@redhat.com
2020-03-13 21:13:06 +11:00
Michael Ellerman
819723a8a2 Merge branch 'fixes' into next
Merge in our fixes branch. In particular we want to merge the TM and KUAP fixes,
so we can add selftests for them in next.
2020-03-10 15:16:42 +11:00
Srikar Dronamraju
247257b03b powerpc/numa: Remove late request for home node associativity
With commit ("powerpc/numa: Early request for home node associativity"),
commit 2ea6263068 ("powerpc/topology: Get topology for shared
processors at boot") which was requesting home node associativity
becomes redundant.

Hence remove the late request for home node associativity.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Reviewed-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135301.24739-6-srikar@linux.vnet.ibm.com
2020-03-04 22:44:31 +11:00
Srikar Dronamraju
a05f0e5be4 powerpc/smp: Use nid as fallback for package_id
package_id is to match cores that are part of the same chip. On
PowerNV machines, package_id defaults to chip_id. However ibm,chip_id
property is not present in device-tree of PowerVM LPARs. Hence lscpu
output shows one core per socket and multiple cores.

To overcome this, use nid as the package_id on PowerVM LPARs.

Before the patch:

  Architecture:        ppc64le
  Byte Order:          Little Endian
  CPU(s):              128
  On-line CPU(s) list: 0-127
  Thread(s) per core:  8
  Core(s) per socket:  1                     <----------------------
  Socket(s):           16                    <----------------------
  NUMA node(s):        2
  Model:               2.2 (pvr 004e 0202)
  Model name:          POWER9 (architected), altivec supported
  Hypervisor vendor:   pHyp
  Virtualization type: para
  L1d cache:           32K
  L1i cache:           32K
  L2 cache:            512K
  L3 cache:            10240K
  NUMA node0 CPU(s):   0-63
  NUMA node1 CPU(s):   64-127
  #
  # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
  -1

After the patch:

  Architecture:        ppc64le
  Byte Order:          Little Endian
  CPU(s):              128
  On-line CPU(s) list: 0-127
  Thread(s) per core:  8                     <---------------------
  Core(s) per socket:  8                     <---------------------
  Socket(s):           2
  NUMA node(s):        2
  Model:               2.2 (pvr 004e 0202)
  Model name:          POWER9 (architected), altivec supported
  Hypervisor vendor:   pHyp
  Virtualization type: para
  L1d cache:           32K
  L1i cache:           32K
  L2 cache:            512K
  L3 cache:            10240K
  NUMA node0 CPU(s):   0-63
  NUMA node1 CPU(s):   64-127
  #
  # cat /sys/devices/system/cpu/cpu0/topology/physical_package_id
  0

Now lscpu output is more in line with the system configuration.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[mpe: Use pkg_id instead of ppid, tweak change log and comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200129135121.24617-1-srikar@linux.vnet.ibm.com
2020-03-04 22:44:30 +11:00
Christophe Leroy
532d43a73c powerpc/irq: Use current_stack_pointer in do_IRQ()
Until commit 7306e83ccf ("powerpc: Don't use CURRENT_THREAD_INFO to
find the stack"), the current stack base address was obtained by
calling current_thread_info(). That inline function was simply masking
out the value of r1.

In that commit, it was changed to using current_stack_pointer() (since
renamed current_stack_frame()), which is a heavier function as it is
an outline assembly function which cannot be inlined and which reads
the content of the stack at 0(r1).

Convert to using current_stack_pointer for geting r1 and masking out
its value to obtain the base address of the stack pointer as before.

Fixes: 7306e83ccf ("powerpc: Don't use CURRENT_THREAD_INFO to find the stack")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220115141.2707-5-mpe@ellerman.id.au
2020-03-04 22:44:30 +11:00
Christophe Leroy
0dec6e1cca powerpc/irq: use IS_ENABLED() in check_stack_overflow()
Instead of #ifdef, use IS_ENABLED(CONFIG_DEBUG_STACKOVERFLOW).
This enable GCC to check for code validity even when the option
is not selected.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220115141.2707-4-mpe@ellerman.id.au
2020-03-04 22:44:29 +11:00
Christophe Leroy
84ab148930 powerpc/irq: Use current_stack_pointer in check_stack_overflow()
The purpose of check_stack_overflow() is to verify that the stack has
not overflowed.

To really know whether the stack pointer is still within boundaries,
the check must be done directly on the value of r1.

So use current_stack_pointer, which returns the current value of r1,
rather than current_stack_frame() which causes a frame to be created
and then returns that value.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220115141.2707-3-mpe@ellerman.id.au
2020-03-04 22:44:28 +11:00
Michael Ellerman
3d13e839e8 powerpc: Rename current_stack_pointer() to current_stack_frame()
current_stack_pointer(), which was called __get_SP(), used to just
return the value in r1.

But that caused problems in some cases, so it was turned into a
function in commit bfe9a2cfe9 ("powerpc: Reimplement __get_SP() as a
function not a define").

Because it's a function in a separate compilation unit to all its
callers, it has the effect of causing a stack frame to be created, and
then returns the address of that frame. This is good in some cases
like those described in the above commit, but in other cases it's
overkill, we just need to know what stack page we're on.

On some other arches current_stack_pointer is just a register global
giving the stack pointer, and we'd like to do that too. So rename our
current_stack_pointer() to current_stack_frame() to make that
possible.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Link: https://lore.kernel.org/r/20200220115141.2707-1-mpe@ellerman.id.au
2020-03-04 22:44:28 +11:00
Kajol Jain
22697da36d powerpc/kernel/sysfs: Add new config option PMU_SYSFS to enable PMU SPRs sysfs file creation
Many of the performance monitoring unit (PMU) SPRs are
exposed in the sysfs. This may not be a desirable since
"perf" API is the primary interface to program PMU and
collect counter data in the system. But that said, we
cant remove these sysfs files since we dont whether
anyone/anything is using them.

So the patch adds a new CONFIG option 'CONFIG_PMU_SYSFS'
(user selectable) to be used in sysfs file creation for
PMU SPRs. New option by default is disabled, but can be
enabled if user needs it.

Tested this patch behaviour in powernv and pseries machines.
Patch is also tested for pmac32_defconfig.

Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Tested-by: Nageswara R Sastry <nasastry@in.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200214080606.26872-2-kjain@linux.ibm.com
2020-03-04 22:44:28 +11:00
Madhavan Srinivasan
fcdb524d44 powerpc/kernel/sysfs: Refactor current sysfs.c
An attempt to refactor the current sysfs.c file.
To start with a big chuck of macro #defines and dscr
functions are moved to start of the file. Secondly,
HAS_ #define macros are cleanup based on CONFIG_ options

Finally new HAS_ macro added:
1. HAS_PPC_PA6T (for PA6T) to separate out non-PMU SPRs.
2. HAS_PPC_PMC56 to separate out PMC SPR's from HAS_PPC_PMC_CLASSIC
   which come under CONFIG_PPC64.

Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200214080606.26872-1-kjain@linux.ibm.com
2020-03-04 22:44:28 +11:00
Greg Kroah-Hartman
860286cf33 powerpc/kernel: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200209105901.1620958-1-gregkh@linuxfoundation.org
2020-03-04 22:44:25 +11:00
Desnes A. Nunes do Rosario
fc37a1632d powerpc: fix hardware PMU exception bug on PowerVM compatibility mode systems
PowerVM systems running compatibility mode on a few Power8 revisions are
still vulnerable to the hardware defect that loses PMU exceptions arriving
prior to a context switch.

The software fix for this issue is enabled through the CPU_FTR_PMAO_BUG
cpu_feature bit, nevertheless this bit also needs to be set for PowerVM
compatibility mode systems.

Fixes: 68f2f0d431 ("powerpc: Add a cpu feature CPU_FTR_PMAO_BUG")
Signed-off-by: Desnes A. Nunes do Rosario <desnesn@linux.ibm.com>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200227134715.9715-1-desnesn@linux.ibm.com
2020-02-28 10:09:33 +11:00
Christophe Leroy
05642cf728 powerpc/32: don't restore r0, r6-r8 on exception entry path after trace_hardirqs_off()
Since commit b86fb88855 ("powerpc/32: implement fast entry for
syscalls on non BOOKE") and commit 1a4b739bbb ("powerpc/32:
implement fast entry for syscalls on BOOKE"), syscalls don't
use the exception entry path anymore. It is therefore pointless
to restore r0 and r6-r8 after calling trace_hardirqs_off().

In the meantime, drop the '2:' label which is unused and misleading.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d2c6dc65d27e83964eb05f16a126161ab6455eea.1578388585.git.christophe.leroy@c-s.fr
2020-02-26 10:34:40 +11:00
Naveen N. Rao
cb0cc635c7 powerpc: Include .BTF section
Selecting CONFIG_DEBUG_INFO_BTF results in the below warning from ld:
  ld: warning: orphan section `.BTF' from `.btf.vmlinux.bin.o' being placed in section `.BTF'

Include .BTF section in vmlinux explicitly to fix the same.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200220113132.857132-1-naveen.n.rao@linux.vnet.ibm.com
2020-02-24 22:04:07 +11:00
Ravi Bangoria
e08658a657 powerpc/watchpoint: Don't call dar_within_range() for Book3S
DAR is set to the first byte of overlap between actual access and
watched range at DSI on Book3S processor. But actual access range
might or might not be within user asked range. So for Book3S, it
must not call dar_within_range().

This revert portion of commit 39413ae009 ("powerpc/hw_breakpoints:
Rewrite 8xx breakpoints to allow any address range size.").

Before patch:
  # ./tools/testing/selftests/powerpc/ptrace/perf-hwbreak
  ...
  TESTED: No overlap
  FAILED: Partial overlap: 0 != 2
  TESTED: Partial overlap
  TESTED: No overlap
  FAILED: Full overlap: 0 != 2
  failure: perf_hwbreak

After patch:
  TESTED: No overlap
  TESTED: Partial overlap
  TESTED: Partial overlap
  TESTED: No overlap
  TESTED: Full overlap
  success: perf_hwbreak

Fixes: 39413ae009 ("powerpc/hw_breakpoints: Rewrite 8xx breakpoints to allow any address range size.")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200222082049.330435-1-ravi.bangoria@linux.ibm.com
2020-02-24 11:19:35 +11:00
Christophe Leroy
c06f0aff03 powerpc: Don't use thread struct for saving SRR0/1 on syscall.
CR0 can be saved later, and CTR can also be used for saving.

Keep SRR1 in r9 and stash SRR0 in CTR, this avoids using thread_struct
in memory for that.

Saves 3 cycles (ie 1%) in null_syscall selftest on 8xx.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b94c3bc03bac9431fec2dadb686384c481889422.1580470483.git.christophe.leroy@c-s.fr
2020-02-19 22:46:08 +11:00
Christophe Leroy
9e27086292 powerpc/32: Warn and return ENOSYS on syscalls from kernel
Since commit b86fb88855 ("powerpc/32: implement fast entry for
syscalls on non BOOKE") and commit 1a4b739bbb ("powerpc/32:
implement fast entry for syscalls on BOOKE"), syscalls from
kernel are unexpected and can have catastrophic consequences
as it will destroy the kernel stack.

Test MSR_PR on syscall entry. In case syscall is from kernel,
emit a warning and return ENOSYS error.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/8ee3bdbbdfdfc64ca7001e90c43b2aee6f333578.1580470482.git.christophe.leroy@c-s.fr
2020-02-19 22:46:08 +11:00
Sourabh Jain
d8e73458f3 powerpc/fadump: sysfs for fadump memory reservation
Add a sys interface to allow querying the memory reserved by FADump for
saving the crash dump.

Also added Documentation/ABI for the new sysfs file.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191211160910.21656-7-sourabhjain@linux.ibm.com
2020-02-19 22:46:07 +11:00
Sourabh Jain
d418b19f34 powerpc/fadump: Reorganize /sys/kernel/fadump_* sysfs files
As the number of FADump sysfs files increases it is hard to manage all
of them inside /sys/kernel directory. It's better to have all the
FADump related sysfs files in a dedicated directory
/sys/kernel/fadump. But in order to maintain backward compatibility a
symlink has been added for every sysfs that has moved to new location.

As the FADump sysfs files are now part of a dedicated directory there
is no need to prefix their name with fadump_, hence sysfs file names
are also updated. For example fadump_enabled sysfs file is now
referred as enabled.

Also consolidate ABI documentation for all the FADump sysfs files in a
single file Documentation/ABI/testing/sysfs-kernel-fadump.

Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Tested-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191211160910.21656-4-sourabhjain@linux.ibm.com
2020-02-19 21:07:09 +11:00
Christophe Leroy
ba32f4b021 powerpc/process: Remove unneccessary #ifdef CONFIG_PPC64 in copy_thread_tls()
is_32bit_task() exists on both PPC64 and PPC32, no need of an ifdefery.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6ecbda05b4119c40222dc8ec284604e1597c9bff.1580327381.git.christophe.leroy@c-s.fr
2020-02-19 21:07:09 +11:00
Christophe Leroy
9eb425b2e0 powerpc/entry: Fix an #if which should be an #ifdef in entry_32.S
Fixes: 12c3f1fd87 ("powerpc/32s: get rid of CPU_FTR_601 feature")
Cc: stable@vger.kernel.org # v5.4+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a99fc0ad65b87a1ba51cfa3e0e9034ee294c3e07.1582034961.git.christophe.leroy@c-s.fr
2020-02-19 10:35:22 +11:00
Christophe Leroy
477f3488a9 powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK
power_save_ppc32_restore() is called during exception entry, before
re-enabling the MMU. It substracts KERNELBASE from the address
of nap_save_msscr0 to access it.

With CONFIG_VMAP_STACK enabled, data MMU translation has already been
re-enabled, so power_save_ppc32_restore() has to access
nap_save_msscr0 by its virtual address.

Reported-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Fixes: cd08f109e2 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7bce32ccbab3ba3e3e0f27da6961bf6313df97ed.1581663140.git.christophe.leroy@c-s.fr
2020-02-18 21:31:12 +11:00
Christophe Leroy
5a528eb679 powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK
With CONFIG_VMAP_STACK, data MMU has to be enabled
to read data on the stack.

Fixes: cd08f109e2 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d2330584f8c42d3039896e2b56f5d39676dc919c.1581669558.git.christophe.leroy@c-s.fr
2020-02-18 21:31:11 +11:00
Christophe Leroy
232ca1eeca powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK
hash_page() needs to read page tables from kernel memory. When entire
kernel memory is mapped by BATs, which is normally the case when
CONFIG_STRICT_KERNEL_RWX is not set, it works even if the page hosting
the page table is not referenced in the MMU hash table.

However, if the page where the page table resides is not covered by
a BAT, a DSI fault can be encountered from hash_page(), and it loops
forever. This can happen when CONFIG_STRICT_KERNEL_RWX is selected
and the alignment of the different regions is too small to allow
covering the entire memory with BATs. This also happens when
CONFIG_DEBUG_PAGEALLOC is selected or when booting with 'nobats'
flag.

Also, if the page containing the kernel stack is not present in the
MMU hash table, registers cannot be saved and a recursive DSI fault
is encountered.

To allow hash_page() to properly do its job at all time and load the
MMU hash table whenever needed, it must run with data MMU disabled.
This means it must be called before re-enabling data MMU. To allow
this, registers clobbered by hash_page() and create_hpte() have to
be saved in the thread struct together with SRR0, SSR1, DAR and DSISR.
It is also necessary to ensure that DSI prolog doesn't overwrite
regs saved by prolog of the current running exception. That means:
- DSI can only use SPRN_SPRG_SCRATCH0
- Exceptions must free SPRN_SPRG_SCRATCH0 before writing to the stack.

This also fixes the Oops reported by Erhard when create_hpte() is
called by add_hash_page().

Due to prolog size increase, a few more exceptions had to get split
in two parts.

Fixes: cd08f109e2 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
Reported-by: Erhard F. <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Tested-by: Erhard F. <erhard_f@mailbox.org>
Tested-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206501
Link: https://lore.kernel.org/r/64a4aa44686e9fd4b01333401367029771d9b231.1581761633.git.christophe.leroy@c-s.fr
2020-02-18 21:31:11 +11:00
Gustavo Luiz Duarte
2464cc4c34 powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery
After a treclaim, we expect to be in non-transactional state. If we
don't clear the current thread's MSR[TS] before we get preempted, then
tm_recheckpoint_new_task() will recheckpoint and we get rescheduled in
suspended transaction state.

When handling a signal caught in transactional state,
handle_rt_signal64() calls get_tm_stackpointer() that treclaims the
transaction using tm_reclaim_current() but without clearing the
thread's MSR[TS]. This can cause the TM Bad Thing exception below if
later we pagefault and get preempted trying to access the user's
sigframe, using __put_user(). Afterwards, when we are rescheduled back
into do_page_fault() (but now in suspended state since the thread's
MSR[TS] was not cleared), upon executing 'rfid' after completion of
the page fault handling, the exception is raised because a transition
from suspended to non-transactional state is invalid.

  Unexpected TM Bad Thing exception at c00000000000de44 (msr 0x8000000302a03031) tm_scratch=800000010280b033
  Oops: Unrecoverable exception, sig: 6 [#1]
  LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
  CPU: 25 PID: 15547 Comm: a.out Not tainted 5.4.0-rc2 #32
  NIP:  c00000000000de44 LR: c000000000034728 CTR: 0000000000000000
  REGS: c00000003fe7bd70 TRAP: 0700   Not tainted  (5.4.0-rc2)
  MSR:  8000000302a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[SE]>  CR: 44000884  XER: 00000000
  CFAR: c00000000000dda4 IRQMASK: 0
  PACATMSCRATCH: 800000010280b033
  GPR00: c000000000034728 c000000f65a17c80 c000000001662800 00007fffacf3fd78
  GPR04: 0000000000001000 0000000000001000 0000000000000000 c000000f611f8af0
  GPR08: 0000000000000000 0000000078006001 0000000000000000 000c000000000000
  GPR12: c000000f611f84b0 c00000003ffcb200 0000000000000000 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
  GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000f611f8140
  GPR24: 0000000000000000 00007fffacf3fd68 c000000f65a17d90 c000000f611f7800
  GPR28: c000000f65a17e90 c000000f65a17e90 c000000001685e18 00007fffacf3f000
  NIP [c00000000000de44] fast_exception_return+0xf4/0x1b0
  LR [c000000000034728] handle_rt_signal64+0x78/0xc50
  Call Trace:
  [c000000f65a17c80] [c000000000034710] handle_rt_signal64+0x60/0xc50 (unreliable)
  [c000000f65a17d30] [c000000000023640] do_notify_resume+0x330/0x460
  [c000000f65a17e20] [c00000000000dcc4] ret_from_except_lite+0x70/0x74
  Instruction dump:
  7c4ff120 e8410170 7c5a03a6 38400000 f8410060 e8010070 e8410080 e8610088
  60000000 60000000 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed0989
  ---[ end trace 93094aa44b442f87 ]---

The simplified sequence of events that triggers the above exception is:

  ...				# userspace in NON-TRANSACTIONAL state
  tbegin			# userspace in TRANSACTIONAL state
  signal delivery		# kernelspace in SUSPENDED state
  handle_rt_signal64()
    get_tm_stackpointer()
      treclaim			# kernelspace in NON-TRANSACTIONAL state
    __put_user()
      page fault happens. We will never get back here because of the TM Bad Thing exception.

  page fault handling kicks in and we voluntarily preempt ourselves
  do_page_fault()
    __schedule()
      __switch_to(other_task)

  our task is rescheduled and we recheckpoint because the thread's MSR[TS] was not cleared
  __switch_to(our_task)
    switch_to_tm()
      tm_recheckpoint_new_task()
        trechkpt			# kernelspace in SUSPENDED state

  The page fault handling resumes, but now we are in suspended transaction state
  do_page_fault()    completes
  rfid     <----- trying to get back where the page fault happened (we were non-transactional back then)
  TM Bad Thing			# illegal transition from suspended to non-transactional

This patch fixes that issue by clearing the current thread's MSR[TS]
just after treclaim in get_tm_stackpointer() so that we stay in
non-transactional state in case we are preempted. In order to make
treclaim and clearing the thread's MSR[TS] atomic from a preemption
perspective when CONFIG_PREEMPT is set, preempt_disable/enable() is
used. It's also necessary to save the previous value of the thread's
MSR before get_tm_stackpointer() is called so that it can be exposed
to the signal handler later in setup_tm_sigcontexts() to inform the
userspace MSR at the moment of the signal delivery.

Found with tm-signal-context-force-tm kernel selftest.

Fixes: 2b0a576d15 ("powerpc: Add new transactional memory state to the signal context")
Cc: stable@vger.kernel.org # v3.9
Signed-off-by: Gustavo Luiz Duarte <gustavold@linux.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200211033831.11165-1-gustavold@linux.ibm.com
2020-02-18 21:30:42 +11:00
Christophe Leroy
a4031afb9d powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss
In ITLB miss handled the line supposed to clear bits 20-23 on the L2
ITLB entry is buggy and does indeed nothing, leading to undefined
value which could allow execution when it shouldn't.

Properly do the clearing with the relevant instruction.

Fixes: 74fabcadfd ("powerpc/8xx: don't use r12/SPRN_SPRG_SCRATCH2 in TLB Miss handlers")
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Leonardo Bras <leonardo@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4f70c2778163affce8508a210f65d140e84524b4.1581272050.git.christophe.leroy@c-s.fr
2020-02-17 12:47:06 +11:00
Sam Bobroff
d4f194ed9e powerpc/eeh: Fix deadlock handling dead PHB
Recovering a dead PHB can currently cause a deadlock as the PCI
rescan/remove lock is taken twice.

This is caused as part of an existing bug in
eeh_handle_special_event(). The pe is processed while traversing the
PHBs even though the pe is unrelated to the loop. This causes the pe
to be, incorrectly, processed more than once.

Untangling this section can move the pe processing out of the loop and
also outside the locked section, correcting both problems.

Fixes: 2e25505147 ("powerpc/eeh: Fix crash when edev->pdev changes")
Cc: stable@vger.kernel.org # 5.4+
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com>
Tested-by: Frederic Barrat <fbarrat@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/0547e82dbf90ee0729a2979a8cac5c91665c621f.1581051445.git.sbobroff@linux.ibm.com
2020-02-17 12:47:05 +11:00
Rob Herring
6a9166b5be powerpc: Drop using struct of_pci_range.pci_space field
Let's use the struct of_pci_range.flags field instead so we can remove
the pci_space field.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
2020-02-14 14:25:20 -06:00
Christophe Leroy
d4bf905307 powerpc: Fix CONFIG_TRACE_IRQFLAGS with CONFIG_VMAP_STACK
When CONFIG_PROVE_LOCKING is selected together with (now default)
CONFIG_VMAP_STACK, kernel enter deadlock during boot.

At the point of checking whether interrupts are enabled or not, the
value of MSR saved on stack is read using the physical address of the
stack. But at this point, when using VMAP stack the DATA MMU
translation has already been re-enabled, leading to deadlock.

Don't use the physical address of the stack when
CONFIG_VMAP_STACK is set.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reported-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 028474876f ("powerpc/32: prepare for CONFIG_VMAP_STACK")
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/daeacdc0dec0416d1c587cc9f9e7191ad3068dc0.1581095957.git.christophe.leroy@c-s.fr
2020-02-08 21:49:06 +11:00
Linus Torvalds
71c3a888cb powerpc updates for 5.6
- Implement user_access_begin() and friends for our platforms that support
    controlling kernel access to userspace.
 
  - Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx.
 
  - Some tweaks to our pseries IOMMU code to allow SVMs ("secure" virtual
    machines) to use the IOMMU.
 
  - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit VDSO, and
    some other improvements.
 
  - A series to use the PCI hotplug framework to control opencapi card's so that
    they can be reset and re-read after flashing a new FPGA image.
 
 As well as other minor fixes and improvements as usual.
 
 Thanks to:
  Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy, Andrew Donnellan,
  Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen Zhou, Christophe Leroy,
  Frederic Barrat, Greg Kurz, Jason A. Donenfeld, Joel Stanley, Jordan Niethe,
  Julia Lawall, Krzysztof Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus
  Walleij, Michael Bringmann, Nathan Chancellor, Nicholas Piggin, Nick
  Desaulniers, Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy
  Dunlap, Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn
  Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago Jung
  Bauermann, Tyrel Datwyler, Vaibhav Jain.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl44uJgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGIcD/9U3R2BK3trEPOStcUbYPte9sMqkyYq
 bcq4o2qrVc5deMvPhcHOQ4j28RUZOKoRODvSbXzGEGKIDlesmKjuP7AicE5qUjjV
 jRtsSOlRElXmPojAgrrlWrFDJOKbW5mFSj2TY/0sjVa06Wcu1Oi6WiQs/TazvZV/
 yzKh5lBL6xyQrmgH0h1VWWbblMbsA1bAL/D7m9Pgimpz0W6fOSRWgXILDUXPLBAy
 Rtt7p1218xPfhe66EgbLhWLIBJb70r+Z9yJNuVbp9NMJbDAhpfOuyMNXpRCELzXD
 5hwm0mFLOwxfSyBgIyIGokLRGFO6XL0uiZIG1Kp+tMxjgnNCmLlRs2R3EF1hoIWi
 49DHRAdK+IEggi6S4dXG5aglz6Rsun8pb/lN7uW+M68t3wp2IYQ+H8MQh4cxPTLu
 wX6KZr28lNG25yyp97nJq2Vld0xTxSSty92P8f588rkolyxzggUy0Xfen41szNrW
 9/bu8NWgt7qVtHmeUoCdWqiIiuMT1k3Of7AN4uAuS6aJHx2Fxr+03ZU5yNr8WIkm
 IOf27z8sUx3F8JL9cIuwAIPB0lSDPw1owvfiTYQ1VkzJa4Ko+kgv5wQ5Ors6V+ve
 XspE4osSP9T9PoHK2MVlu8mOjLpoo3Ibr849J0lGHQZDP6U3kHNILGfcXA8WP/9b
 Fgfh5Wj22cQe8A==
 =xpG+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "A pretty small batch for us, and apologies for it being a bit late, I
  wanted to sneak Christophe's user_access_begin() series in.

  Summary:

   - Implement user_access_begin() and friends for our platforms that
     support controlling kernel access to userspace.

   - Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx.

   - Some tweaks to our pseries IOMMU code to allow SVMs ("secure"
     virtual machines) to use the IOMMU.

   - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit
     VDSO, and some other improvements.

   - A series to use the PCI hotplug framework to control opencapi
     card's so that they can be reset and re-read after flashing a new
     FPGA image.

  As well as other minor fixes and improvements as usual.

  Thanks to: Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy,
  Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen
  Zhou, Christophe Leroy, Frederic Barrat, Greg Kurz, Jason A.
  Donenfeld, Joel Stanley, Jordan Niethe, Julia Lawall, Krzysztof
  Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus Walleij, Michael
  Bringmann, Nathan Chancellor, Nicholas Piggin, Nick Desaulniers,
  Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy Dunlap,
  Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn
  Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago
  Jung Bauermann, Tyrel Datwyler, Vaibhav Jain"

* tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (131 commits)
  powerpc: configs: Cleanup old Kconfig options
  powerpc/configs/skiroot: Enable some more hardening options
  powerpc/configs/skiroot: Disable xmon default & enable reboot on panic
  powerpc/configs/skiroot: Enable security features
  powerpc/configs/skiroot: Update for symbol movement only
  powerpc/configs/skiroot: Drop default n CONFIG_CRYPTO_ECHAINIV
  powerpc/configs/skiroot: Drop HID_LOGITECH
  powerpc/configs: Drop NET_VENDOR_HP which moved to staging
  powerpc/configs: NET_CADENCE became NET_VENDOR_CADENCE
  powerpc/configs: Drop CONFIG_QLGE which moved to staging
  powerpc: Do not consider weak unresolved symbol relocations as bad
  powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK
  powerpc: indent to improve Kconfig readability
  powerpc: Provide initial documentation for PAPR hcalls
  powerpc: Implement user_access_save() and user_access_restore()
  powerpc: Implement user_access_begin and friends
  powerpc/32s: Prepare prevent_user_access() for user_access_end()
  powerpc/32s: Drop NULL addr verification
  powerpc/kuap: Fix set direction in allow/prevent_user_access()
  powerpc/32s: Fix bad_kuap_fault()
  ...
2020-02-04 13:06:46 +00:00
Alexey Dobriyan
97a32539b9 proc: convert everything to "struct proc_ops"
The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
seq_file.h.

Conversion rule is:

	llseek		=> proc_lseek
	unlocked_ioctl	=> proc_ioctl

	xxx		=> proc_xxx

	delete ".owner = THIS_MODULE" line

[akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
[sfr@canb.auug.org.au: fix kernel/sched/psi.c]
  Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-04 03:05:26 +00:00
Linus Torvalds
83fa805bcb threads-v5.6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXjFo8wAKCRCRxhvAZXjc
 omaGAQDVwCHQekqxp2eC8EJH4Pkt+Bn1BLrA25stlTo93YBPHgEAsPVUCRNcrZAl
 VncYmxCfpt3Yu0S/MTVXu5xrRiIXPQk=
 =uqTN
 -----END PGP SIGNATURE-----

Merge tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull thread management updates from Christian Brauner:
 "Sargun Dhillon over the last cycle has worked on the pidfd_getfd()
  syscall.

  This syscall allows for the retrieval of file descriptors of a process
  based on its pidfd. A task needs to have ptrace_may_access()
  permissions with PTRACE_MODE_ATTACH_REALCREDS (suggested by Oleg and
  Andy) on the target.

  One of the main use-cases is in combination with seccomp's user
  notification feature. As a reminder, seccomp's user notification
  feature was made available in v5.0. It allows a task to retrieve a
  file descriptor for its seccomp filter. The file descriptor is usually
  handed of to a more privileged supervising process. The supervisor can
  then listen for syscall events caught by the seccomp filter of the
  supervisee and perform actions in lieu of the supervisee, usually
  emulating syscalls. pidfd_getfd() is needed to expand its uses.

  There are currently two major users that wait on pidfd_getfd() and one
  future user:

   - Netflix, Sargun said, is working on a service mesh where users
     should be able to connect to a dns-based VIP. When a user connects
     to e.g. 1.2.3.4:80 that runs e.g. service "foo" they will be
     redirected to an envoy process. This service mesh uses seccomp user
     notifications and pidfd to intercept all connect calls and instead
     of connecting them to 1.2.3.4:80 connects them to e.g.
     127.0.0.1:8080.

   - LXD uses the seccomp notifier heavily to intercept and emulate
     mknod() and mount() syscalls for unprivileged containers/processes.
     With pidfd_getfd() more uses-cases e.g. bridging socket connections
     will be possible.

   - The patchset has also seen some interest from the browser corner.
     Right now, Firefox is using a SECCOMP_RET_TRAP sandbox managed by a
     broker process. In the future glibc will start blocking all signals
     during dlopen() rendering this type of sandbox impossible. Hence,
     in the future Firefox will switch to a seccomp-user-nofication
     based sandbox which also makes use of file descriptor retrieval.
     The thread for this can be found at
     https://sourceware.org/ml/libc-alpha/2019-12/msg00079.html

  With pidfd_getfd() it is e.g. possible to bridge socket connections
  for the supervisee (binding to a privileged port) and taking actions
  on file descriptors on behalf of the supervisee in general.

  Sargun's first version was using an ioctl on pidfds but various people
  pushed for it to be a proper syscall which he duely implemented as
  well over various review cycles. Selftests are of course included.
  I've also added instructions how to deal with merge conflicts below.

  There's also a small fix coming from the kernel mentee project to
  correctly annotate struct sighand_struct with __rcu to fix various
  sparse warnings. We've received a few more such fixes and even though
  they are mostly trivial I've decided to postpone them until after -rc1
  since they came in rather late and I don't want to risk introducing
  build warnings.

  Finally, there's a new prctl() command PR_{G,S}ET_IO_FLUSHER which is
  needed to avoid allocation recursions triggerable by storage drivers
  that have userspace parts that run in the IO path (e.g. dm-multipath,
  iscsi, etc). These allocation recursions deadlock the device.

  The new prctl() allows such privileged userspace components to avoid
  allocation recursions by setting the PF_MEMALLOC_NOIO and
  PF_LESS_THROTTLE flags. The patch carries the necessary acks from the
  relevant maintainers and is routed here as part of prctl()
  thread-management."

* tag 'threads-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim
  sched.h: Annotate sighand_struct with __rcu
  test: Add test for pidfd getfd
  arch: wire up pidfd_getfd syscall
  pid: Implement pidfd_getfd syscall
  vfs, fdtable: Add fget_task helper
2020-01-29 19:38:34 -08:00
Linus Torvalds
6aee4badd8 Merge branch 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull openat2 support from Al Viro:
 "This is the openat2() series from Aleksa Sarai.

  I'm afraid that the rest of namei stuff will have to wait - it got
  zero review the last time I'd posted #work.namei, and there had been a
  leak in the posted series I'd caught only last weekend. I was going to
  repost it on Monday, but the window opened and the odds of getting any
  review during that... Oh, well.

  Anyway, openat2 part should be ready; that _did_ get sane amount of
  review and public testing, so here it comes"

From Aleksa's description of the series:
 "For a very long time, extending openat(2) with new features has been
  incredibly frustrating. This stems from the fact that openat(2) is
  possibly the most famous counter-example to the mantra "don't silently
  accept garbage from userspace" -- it doesn't check whether unknown
  flags are present[1].

  This means that (generally) the addition of new flags to openat(2) has
  been fraught with backwards-compatibility issues (O_TMPFILE has to be
  defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
  kernels gave errors, since it's insecure to silently ignore the
  flag[2]). All new security-related flags therefore have a tough road
  to being added to openat(2).

  Furthermore, the need for some sort of control over VFS's path
  resolution (to avoid malicious paths resulting in inadvertent
  breakouts) has been a very long-standing desire of many userspace
  applications.

  This patchset is a revival of Al Viro's old AT_NO_JUMPS[3] patchset
  (which was a variant of David Drysdale's O_BENEATH patchset[4] which
  was a spin-off of the Capsicum project[5]) with a few additions and
  changes made based on the previous discussion within [6] as well as
  others I felt were useful.

  In line with the conclusions of the original discussion of
  AT_NO_JUMPS, the flag has been split up into separate flags. However,
  instead of being an openat(2) flag it is provided through a new
  syscall openat2(2) which provides several other improvements to the
  openat(2) interface (see the patch description for more details). The
  following new LOOKUP_* flags are added:

  LOOKUP_NO_XDEV:

     Blocks all mountpoint crossings (upwards, downwards, or through
     absolute links). Absolute pathnames alone in openat(2) do not
     trigger this. Magic-link traversal which implies a vfsmount jump is
     also blocked (though magic-link jumps on the same vfsmount are
     permitted).

  LOOKUP_NO_MAGICLINKS:

     Blocks resolution through /proc/$pid/fd-style links. This is done
     by blocking the usage of nd_jump_link() during resolution in a
     filesystem. The term "magic-links" is used to match with the only
     reference to these links in Documentation/, but I'm happy to change
     the name.

     It should be noted that this is different to the scope of
     ~LOOKUP_FOLLOW in that it applies to all path components. However,
     you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
     will *not* fail (assuming that no parent component was a
     magic-link), and you will have an fd for the magic-link.

     In order to correctly detect magic-links, the introduction of a new
     LOOKUP_MAGICLINK_JUMPED state flag was required.

  LOOKUP_BENEATH:

     Disallows escapes to outside the starting dirfd's
     tree, using techniques such as ".." or absolute links. Absolute
     paths in openat(2) are also disallowed.

     Conceptually this flag is to ensure you "stay below" a certain
     point in the filesystem tree -- but this requires some additional
     to protect against various races that would allow escape using
     "..".

     Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
     can trivially beam you around the filesystem (breaking the
     protection). In future, there might be similar safety checks done
     as in LOOKUP_IN_ROOT, but that requires more discussion.

  In addition, two new flags are added that expand on the above ideas:

  LOOKUP_NO_SYMLINKS:

     Does what it says on the tin. No symlink resolution is allowed at
     all, including magic-links. Just as with LOOKUP_NO_MAGICLINKS this
     can still be used with NOFOLLOW to open an fd for the symlink as
     long as no parent path had a symlink component.

  LOOKUP_IN_ROOT:

     This is an extension of LOOKUP_BENEATH that, rather than blocking
     attempts to move past the root, forces all such movements to be
     scoped to the starting point. This provides chroot(2)-like
     protection but without the cost of a chroot(2) for each filesystem
     operation, as well as being safe against race attacks that
     chroot(2) is not.

     If a race is detected (as with LOOKUP_BENEATH) then an error is
     generated, and similar to LOOKUP_BENEATH it is not permitted to
     cross magic-links with LOOKUP_IN_ROOT.

     The primary need for this is from container runtimes, which
     currently need to do symlink scoping in userspace[7] when opening
     paths in a potentially malicious container.

     There is a long list of CVEs that could have bene mitigated by
     having RESOLVE_THIS_ROOT (such as CVE-2017-1002101,
     CVE-2017-1002102, CVE-2018-15664, and CVE-2019-5736, just to name a
     few).

  In order to make all of the above more usable, I'm working on
  libpathrs[8] which is a C-friendly library for safe path resolution.
  It features a userspace-emulated backend if the kernel doesn't support
  openat2(2). Hopefully we can get userspace to switch to using it, and
  thus get openat2(2) support for free once it's ready.

  Future work would include implementing things like
  RESOLVE_NO_AUTOMOUNT and possibly a RESOLVE_NO_REMOTE (to allow
  programs to be sure they don't hit DoSes though stale NFS handles)"

* 'work.openat2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Documentation: path-lookup: include new LOOKUP flags
  selftests: add openat2(2) selftests
  open: introduce openat2(2) syscall
  namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
  namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
  namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
  namei: LOOKUP_NO_XDEV: block mountpoint crossing
  namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
  namei: LOOKUP_NO_SYMLINKS: block symlink resolution
  namei: allow set_root() to produce errors
  namei: allow nd_jump_link() to produce errors
  nsfs: clean-up ns_get_path() signature to return int
  namei: only return -ECHILD from follow_dotdot_rcu()
2020-01-29 11:20:24 -08:00
Linus Torvalds
ca9b5b6283 TTY/Serial driver updates for 5.6-rc1
Here are the big set of tty and serial driver updates for 5.6-rc1
 
 Included in here are:
 	- dummy_con cleanups (touches lots of arch code)
 	- sysrq logic cleanups (touches lots of serial drivers)
 	- samsung driver fixes (wasn't really being built)
 	- conmakeshash move to tty subdir out of scripts
 	- lots of small tty/serial driver updates
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXjFRBg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yn2VACgkge7vTeUNeZFc+6F4NWphAQ5tCQAoK/MMbU6
 0O8ef7PjFwCU4s227UTv
 =6m40
 -----END PGP SIGNATURE-----

Merge tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here are the big set of tty and serial driver updates for 5.6-rc1

  Included in here are:
   - dummy_con cleanups (touches lots of arch code)
   - sysrq logic cleanups (touches lots of serial drivers)
   - samsung driver fixes (wasn't really being built)
   - conmakeshash move to tty subdir out of scripts
   - lots of small tty/serial driver updates

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (140 commits)
  tty: n_hdlc: Use flexible-array member and struct_size() helper
  tty: baudrate: SPARC supports few more baud rates
  tty: baudrate: Synchronise baud_table[] and baud_bits[]
  tty: serial: meson_uart: Add support for kernel debugger
  serial: imx: fix a race condition in receive path
  serial: 8250_bcm2835aux: Document struct bcm2835aux_data
  serial: 8250_bcm2835aux: Use generic remapping code
  serial: 8250_bcm2835aux: Allocate uart_8250_port on stack
  serial: 8250_bcm2835aux: Suppress register_port error on -EPROBE_DEFER
  serial: 8250_bcm2835aux: Suppress clk_get error on -EPROBE_DEFER
  serial: 8250_bcm2835aux: Fix line mismatch on driver unbind
  serial_core: Remove unused member in uart_port
  vt: Correct comment documenting do_take_over_console()
  vt: Delete comment referencing non-existent unbind_con_driver()
  arch/xtensa/setup: Drop dummy_con initialization
  arch/x86/setup: Drop dummy_con initialization
  arch/unicore32/setup: Drop dummy_con initialization
  arch/sparc/setup: Drop dummy_con initialization
  arch/sh/setup: Drop dummy_con initialization
  arch/s390/setup: Drop dummy_con initialization
  ...
2020-01-29 10:13:27 -08:00
Vaibhav Jain
58b278f568 powerpc: Provide initial documentation for PAPR hcalls
This doc patch provides an initial description of the hcall op-codes
that are used by Linux kernel running as a guest (LPAR) on top of
PowerVM or any other sPAPR compliant hyper-visor (e.g qemu).

Apart from documenting the hcalls the doc-patch also provides a
rudimentary overview of how hcall ABI, how they are issued with the
Linux kernel and how information/control flows between the guest and
hypervisor.

Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add SPDX tag, add it to index.rst]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190828082729.16695-1-vaibhav@linux.ibm.com
2020-01-29 21:30:50 +11:00
Christophe Leroy
9933819099 powerpc/32s: Fix CPU wake-up from sleep mode
Commit f7354ccac8 ("powerpc/32: Remove CURRENT_THREAD_INFO and
rename TI_CPU") broke the CPU wake-up from sleep mode (i.e. when
_TLF_SLEEPING is set) by delaying the tovirt(r2, r2).

This is because r2 is not restored by fast_exception_return. It used
to work (by chance ?) because CPU wake-up interrupt never comes from
user, so r2 is expected to point to 'current' on return.

Commit e2fb9f5444 ("powerpc/32: Prepare for Kernel Userspace Access
Protection") broke it even more by clobbering r0 which is not
restored by fast_exception_return either.

Use r6 instead of r0. This is possible because r3-r6 are restored by
fast_exception_return and only r3-r5 are used for exception arguments.

For r2 it could be converted back to virtual address, but stay on the
safe side and restore it from the stack instead. It should be live
in the cache at that moment, so loading from the stack should make
no difference compared to converting it from phys to virt.

Fixes: f7354ccac8 ("powerpc/32: Remove CURRENT_THREAD_INFO and rename TI_CPU")
Fixes: e2fb9f5444 ("powerpc/32: Prepare for Kernel Userspace Access Protection")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/6d02c3ae6ad77af34392e98117e44c2bf6d13ba1.1580121710.git.christophe.leroy@c-s.fr
2020-01-28 23:01:19 +11:00
Christophe Leroy
cd08f109e2 powerpc/32s: Enable CONFIG_VMAP_STACK
A few changes to retrieve DAR and DSISR from struct regs
instead of retrieving them directly, as they may have
changed due to a TLB miss.

Also modifies hash_page() and friends to work with virtual
data addresses instead of physical ones. Same on load_up_fpu()
and load_up_altivec().

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Fix tovirt_vmstack call in head_32.S to fix CHRP build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2e2509a242fd5f3e23df4a06530c18060c4d321e.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:37:24 +11:00
Christophe Leroy
94dd54c51a powerpc/32s: Avoid crossing page boundary while changing SRR0/1.
Trying VMAP_STACK with KVM, vmlinux was not starting.
This was due to SRR0 and SRR1 clobbered by an ISI due to
the rfi being in a different page than the mtsrr0/1:

c0003fe0 <mmu_off>:
c0003fe0:       38 83 00 54     addi    r4,r3,84
c0003fe4:       7c 60 00 a6     mfmsr   r3
c0003fe8:       70 60 00 30     andi.   r0,r3,48
c0003fec:       4d 82 00 20     beqlr
c0003ff0:       7c 63 00 78     andc    r3,r3,r0
c0003ff4:       7c 9a 03 a6     mtsrr0  r4
c0003ff8:       7c 7b 03 a6     mtsrr1  r3
c0003ffc:       7c 00 04 ac     hwsync
c0004000:       4c 00 00 64     rfi

Align the 4 instruction block used to deactivate MMU to order 4,
so that the block never crosses a page boundary.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/30d2cda111b7977227fff067fa7e358440e2b3a4.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:37:03 +11:00
Christophe Leroy
2e15001ea9 powerpc/32s: Reorganise DSI handler.
The part decidated to handling hash_page() is fully unneeded for
processors not having real hash pages like the 603.

Lets enlarge the content of the feature fixup, and provide
an alternative which jumps directly instead of getting NIPs.

Also, in preparation of VMAP stacks, the end of DSI handler has moved
to later in the code as it won't fit anymore once VMAP stacks
are there.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c31b22c91af8b011d0a4fd9e52ad6afb4b593f71.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:36:59 +11:00
Christophe Leroy
99b229161f powerpc/8xx: Enable CONFIG_VMAP_STACK
This patch enables CONFIG_VMAP_STACK. For that, a few changes are
done in head_8xx.S.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d7ba1e34e80898310d6a314cbebe48baa32894ef.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:36:59 +11:00
Michael Ellerman
d52668f6b3 powerpc/8xx: Move tail of alignment exception out of line
When we enable VMAP_STACK there will not be enough room for the
alignment handler at 0x600 in head_8xx.S. For now move the tail of the
alignment handler out of line, and branch to it.

Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2020-01-27 22:36:59 +11:00
Christophe Leroy
afe1ec5ab8 powerpc/8xx: Split breakpoint exception
Breakpoint exception is big.

Split it to support future growth on exception prolog.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1dda3293d86d0f715b13b2633c95d2188a42a02c.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:36:55 +11:00
Christophe Leroy
596419afed powerpc/8xx: Move DataStoreTLBMiss perf handler
Move DataStoreTLBMiss perf handler in order to cope
with future growing exception prolog.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/75dd28b04efd2cbdbf01153173d99c11cdff2f08.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:36:50 +11:00
Christophe Leroy
9260f76ae2 powerpc/8xx: Drop exception entries for non-existing exceptions
head_8xx.S has entries for all exceptions from 0x100 to 0x1f00.
Several of them do not exist and are never generated by the 8xx
in accordance with the documentation.

Remove those entry points to make some room for future growing
exception code.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/66f92866fe9524cf0f056016921c7d53adaef3a0.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:36:45 +11:00
Christophe Leroy
6edc318585 powerpc/8xx: Use alternative scratch registers in DTLB miss handler
In preparation of handling CONFIG_VMAP_STACK, DTLB miss handler need
to use different scratch registers than other exception handlers in
order to not jeopardise exception entry on stack DTLB misses.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c5287ea59ae9630f505019b309bf94029241635f.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:36:16 +11:00
Christophe Leroy
547db12fd8 powerpc/32: Use vmapped stacks for interrupts
In order to also catch overflows on IRQ stacks, use vmapped stacks.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d33ad1b36ddff4dcc19f96c592c12a61cf85d406.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:36:15 +11:00
Christophe Leroy
3978eb7851 powerpc/32: Add early stack overflow detection with VMAP stack.
To avoid recursive faults, stack overflow detection has to be
performed before writing in the stack in exception prologs.

Do it by checking the alignment. If the stack pointer alignment is
wrong, it means it is pointing to the following or preceding page.

Without VMAP stack, a stack overflow is catastrophic. With VMAP
stack, a stack overflow isn't destructive, so don't panic. Kill
the task with SIGSEGV instead.

A dedicated overflow stack is set up for each CPU.

  lkdtm: Performing direct entry EXHAUST_STACK
  lkdtm: Calling function with 512 frame size to depth 32 ...
  lkdtm: loop 32/32 ...
  lkdtm: loop 31/32 ...
  lkdtm: loop 30/32 ...
  lkdtm: loop 29/32 ...
  lkdtm: loop 28/32 ...
  lkdtm: loop 27/32 ...
  lkdtm: loop 26/32 ...
  lkdtm: loop 25/32 ...
  lkdtm: loop 24/32 ...
  lkdtm: loop 23/32 ...
  lkdtm: loop 22/32 ...
  lkdtm: loop 21/32 ...
  lkdtm: loop 20/32 ...
  Kernel stack overflow in process test[359], r1=c900c008
  Oops: Kernel stack overflow, sig: 6 [#1]
  BE PAGE_SIZE=4K MMU=Hash PowerMac
  Modules linked in:
  CPU: 0 PID: 359 Comm: test Not tainted 5.3.0-rc7+ #2225
  NIP:  c0622060 LR: c0626710 CTR: 00000000
  REGS: c0895f48 TRAP: 0000   Not tainted  (5.3.0-rc7+)
  MSR:  00001032 <ME,IR,DR,RI>  CR: 28004224  XER: 00000000
  GPR00: c0626ca4 c900c008 c783c000 c07335cc c900c010 c07335cc c900c0f0 c07335cc
  GPR08: c900c0f0 00000001 00000000 00000000 28008222 00000000 00000000 00000000
  GPR16: 00000000 00000000 10010128 10010000 b799c245 10010158 c07335cc 00000025
  GPR24: c0690000 c08b91d4 c068f688 00000020 c900c0f0 c068f668 c08b95b4 c08b91d4
  NIP [c0622060] format_decode+0x0/0x4d4
  LR [c0626710] vsnprintf+0x80/0x5fc
  Call Trace:
  [c900c068] [c0626ca4] vscnprintf+0x18/0x48
  [c900c078] [c007b944] vprintk_store+0x40/0x214
  [c900c0b8] [c007bf50] vprintk_emit+0x90/0x1dc
  [c900c0e8] [c007c5cc] printk+0x50/0x60
  [c900c128] [c03da5b0] recursive_loop+0x44/0x6c
  [c900c338] [c03da5c4] recursive_loop+0x58/0x6c
  [c900c548] [c03da5c4] recursive_loop+0x58/0x6c
  [c900c758] [c03da5c4] recursive_loop+0x58/0x6c
  [c900c968] [c03da5c4] recursive_loop+0x58/0x6c
  [c900cb78] [c03da5c4] recursive_loop+0x58/0x6c
  [c900cd88] [c03da5c4] recursive_loop+0x58/0x6c
  [c900cf98] [c03da5c4] recursive_loop+0x58/0x6c
  [c900d1a8] [c03da5c4] recursive_loop+0x58/0x6c
  [c900d3b8] [c03da5c4] recursive_loop+0x58/0x6c
  [c900d5c8] [c03da5c4] recursive_loop+0x58/0x6c
  [c900d7d8] [c03da5c4] recursive_loop+0x58/0x6c
  [c900d9e8] [c03da5c4] recursive_loop+0x58/0x6c
  [c900dbf8] [c03da5c4] recursive_loop+0x58/0x6c
  [c900de08] [c03da67c] lkdtm_EXHAUST_STACK+0x30/0x4c
  [c900de18] [c03da3e8] direct_entry+0xc8/0x140
  [c900de48] [c029fb40] full_proxy_write+0x64/0xcc
  [c900de68] [c01500f8] __vfs_write+0x30/0x1d0
  [c900dee8] [c0152cb8] vfs_write+0xb8/0x1d4
  [c900df08] [c0152f7c] ksys_write+0x58/0xe8
  [c900df38] [c0014208] ret_from_syscall+0x0/0x34
  --- interrupt: c01 at 0xf806664
      LR = 0x1000c868
  Instruction dump:
  4bffff91 80010014 7c832378 7c0803a6 38210010 4e800020 3d20c08a 3ca0c089
  8089a0cc 38a58f0c 38600001 4ba2d494 <9421ffe0> 7c0802a6 bfc10018 7c9f2378

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1b89c121b4070c7ee99e4f22cc178f15a736b07b.1576916812.git.christophe.leroy@c-s.fr
2020-01-27 22:35:49 +11:00
Christophe Leroy
63289e7d3a powerpc: align stack to 2 * THREAD_SIZE with VMAP_STACK
In order to ease stack overflow detection, align
stack to 2 * THREAD_SIZE when using VMAP_STACK.
This allows overflow detection using a single bit check.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/60e9ae86b7d2cdcf21468787076d345663648f46.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:09 +11:00
Christophe Leroy
028474876f powerpc/32: prepare for CONFIG_VMAP_STACK
To support CONFIG_VMAP_STACK, the kernel has to activate Data MMU
Translation for accessing the stack. Before doing that it must save
SRR0, SRR1 and also DAR and DSISR when relevant, in order to not
loose them in case there is a Data TLB Miss once the translation is
reactivated.

This patch adds fields in thread struct for saving those registers.
It prepares entry_32.S to handle exception entry with
Data MMU Translation enabled and alters EXCEPTION_PROLOG macros to
save SRR0, SRR1, DAR and DSISR then reenables Data MMU.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a775a1fea60f190e0f63503463fb775310a2009b.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:09 +11:00
Christophe Leroy
c9c84fd945 powerpc/32: add a macro to get and/or save DAR and DSISR on stack.
Refactor reading and saving of DAR and DSISR in exception vectors.

This will ease the implementation of VMAP stack.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1286b3e51b07727c6b4b05f2df9af3f9b1717fb5.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:09 +11:00
Christophe Leroy
5ae8fabc64 powerpc/32: move MSR_PR test into EXCEPTION_PROLOG_0
In order to simplify  VMAP stack implementation, move
MSR_PR test into EXCEPTION_PROLOG_0.

This requires to not modify cr0 between EXCEPTION_PROLOG_0
and EXCEPTION_PROLOG_1.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5c8b5bba692b92654dbd363a229a1ba91db725bb.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:09 +11:00
Christophe Leroy
1ca9db5b65 powerpc/32: save DEAR/DAR before calling handle_page_fault
handle_page_fault() is the only function that save DAR/DEAR itself.

Save DAR/DEAR before calling handle_page_fault() to prepare for
VMAP stack which will require to save even before.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3a4d58d378091086f00fde42b59610c80289e120.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:09 +11:00
Christophe Leroy
1f1c4d0122 powerpc/32: Add EXCEPTION_PROLOG_0 in head_32.h
This patch creates a macro for the very first part of
exception prolog, this will help when implementing
CONFIG_VMAP_STACK

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2249fe62f481121a180e9655ad2b998093f318f3.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:08 +11:00
Christophe Leroy
39bccfd164 powerpc/32: replace MTMSRD() by mtmsr
On PPC32, MTMSRD() is simply defined as mtmsr.

Replace MTMSRD(reg) by mtmsr reg in files dedicated to PPC32,
this makes the code less obscure.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/22469e78230edea3dbd0c79a555d73124f6c6d93.1576916812.git.christophe.leroy@c-s.fr
2020-01-26 22:15:08 +11:00
Jordan Niethe
736bcdd3a9 powerpc/mm: Remove kvm radix prefetch workaround for Power9 DD2.2
Commit a25bd72bad ("powerpc/mm/radix: Workaround prefetch issue with
KVM") introduced a number of workarounds as coming out of a guest with
the mmu enabled would make the cpu would start running in hypervisor
state with the PID value from the guest. The cpu will then start
prefetching for the hypervisor with that PID value.

In Power9 DD2.2 the cpu behaviour was modified to fix this. When
accessing Quadrant 0 in hypervisor mode with LPID != 0 prefetching will
not be performed. This means that we can get rid of the workarounds for
Power9 DD2.2 and later revisions. Add a new cpu feature
CPU_FTR_P9_RADIX_PREFETCH_BUG to indicate if the workarounds are needed.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191206031722.25781-1-jniethe5@gmail.com
2020-01-26 00:11:37 +11:00
Christophe Leroy
def0bfdbd6 powerpc: use probe_user_read() and probe_user_write()
Instead of opencoding, use probe_user_read() to failessly read
a user location and probe_user_write() for writing to user.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e041f5eedb23f09ab553be8a91c3de2087147320.1579800517.git.christophe.leroy@c-s.fr
2020-01-26 00:11:35 +11:00
Oliver O'Halloran
3ab3f3c9df powerpc/pci: Fold pcibios_setup_device() into pcibios_bus_add_device()
pcibios_bus_add_device() is the only caller of pcibios_setup_device().
Fold them together since there's no real reason to keep them separate.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200110070207.439-2-oohall@gmail.com
2020-01-23 21:31:21 +11:00
Oliver O'Halloran
4e0942c030 powerpc/eeh: Only dump stack once if an MMIO loop is detected
Many drivers don't check for errors when they get a 0xFFs response from an
MMIO load. As a result after an EEH event occurs a driver can get stuck in
a polling loop unless it some kind of internal timeout logic.

Currently EEH tries to detect and report stuck drivers by dumping a stack
trace after eeh_dev_check_failure() is called EEH_MAX_FAILS times on an
already frozen PE. The value of EEH_MAX_FAILS was chosen so that a dump
would occur every few seconds if the driver was spinning in a loop. This
results in a lot of spurious stack traces in the kernel log.

Fix this by limiting it to printing one stack trace for each PE freeze. If
the driver is truely stuck the kernel's hung task detector is better suited
to reporting the probelm anyway.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191016012536.22588-1-oohall@gmail.com
2020-01-23 21:31:20 +11:00
Oliver O'Halloran
a4af49f34f powerpc/pcidn: Warn when sriov pci_dn management is used incorrectly
These functions can only be used on a SR-IOV capable physical function and
they're only called in pcibios_sriov_enable / disable. Make them emit a
warning in the future if they're used incorrectly and remove the dead
code that checks if the device is a VF.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190821062655.19735-3-oohall@gmail.com
2020-01-23 21:31:19 +11:00
Oliver O'Halloran
8cd6aacc64 powerpc/pcidn: Make VF pci_dn management CONFIG_PCI_IOV specific
The powerpc PCI code requires that a pci_dn structure exists for all
devices in the system. This is fine for real devices since at boot a pci_dn
is created for each PCI device in the DT and it's fine for hotplugged devices
since the hotplug slot driver will manage the pci_dn's devices in hotplug
slots. For SR-IOV, we need the platform / pcibios to manage the pci_dn for
virtual functions since firmware is unaware of VFs, and they aren't
"hot plugged" in the traditional sense.

Management of the pci_dn is handled by the, poorly named, functions:
add_pci_dev_data() and remove_pci_dev_data(). The entire body of these
functions is #ifdef`ed around CONFIG_PCI_IOV and they cannot be used
in any other context, so make them only available when CONFIG_PCI_IOV
is selected, and rename them to reflect their actual usage rather than
having them masquerade as generic code.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190821062655.19735-2-oohall@gmail.com
2020-01-23 21:31:19 +11:00
Oliver O'Halloran
1fb4124ca9 powerpc/sriov: Remove VF eeh_dev state when disabling SR-IOV
When disabling virtual functions on an SR-IOV adapter we currently do not
correctly remove the EEH state for the now-dead virtual functions. When
removing the pci_dn that was created for the VF when SR-IOV was enabled
we free the corresponding eeh_dev without removing it from the child device
list of the eeh_pe that contained it. This can result in crashes due to the
use-after-free.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190821062655.19735-1-oohall@gmail.com
2020-01-23 21:31:19 +11:00
Oliver O'Halloran
3489cdc417 powerpc/eeh_sysfs: Make clearing EEH_DEV_SYSFS saner
The eeh_sysfs_remove_device() function is supposed to clear the
EEH_DEV_SYSFS flag since it indicates the EEH sysfs entries have been added
for a pci_dev.

When the sysfs files are removed eeh_remove_device() the eeh_dev and the
pci_dev have already been de-associated. This then causes the
pci_dev_to_eeh_dev() call in eeh_sysfs_remove_device() to return NULL so
the flag can't be cleared from the still-live eeh_dev. This problem is
worked around in the caller by clearing the flag manually. However, this
behaviour doesn't make a whole lot of sense, so this patch fixes it by:

a) Re-ordering eeh_remove_device() so that eeh_sysfs_remove_device() is
   called before de-associating the pci_dev and eeh_dev.

b) Making eeh_sysfs_remove_device() emit a warning if there's no
   corresponding eeh_dev for a pci_dev. The paths where the sysfs
   files are only reachable if EEH was setup for the device
   for the device in the first place so hitting this warning
   indicates a programming error.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-6-oohall@gmail.com
2020-01-23 21:31:19 +11:00
Oliver O'Halloran
758b423275 powerpc/eeh_sysfs: Remove double pci_dn lookup.
In eeh_notify_resume_show() the pci_dn for the device is looked up once in
the declaration block and then once after checking for a NULL eeh_dev.
Remove the second lookup since it's pointless.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-5-oohall@gmail.com
2020-01-23 21:31:19 +11:00
Oliver O'Halloran
4107248c56 powerpc/eeh_sysfs: ifdef pseries sr-iov sysfs properties
There are several EEH sysfs properties that only exists when the
"ibm,is-open-sriov-pf" property appears in the device tree node of the PCI
device. This used on pseries to indicate to the guest that the hypervisor
allows the guest to configure the SR-IOV capability. Doing this requires
some handshaking between the guest, hypervisor and userspace when a VF is
EEH frozen which is why these properties exist.

This is all dead code on non-pseries platforms so wrap it in an #ifdef
CONFIG_PPC_PSERIES to make the dependency clearer.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-4-oohall@gmail.com
2020-01-23 21:31:18 +11:00
Oliver O'Halloran
89f51839bd powerpc/eeh_sysfs: Fix incorrect comment
The EEH_ATTR_SHOW() helper is used to display fields from struct eeh_dev
not struct pci_dn.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-3-oohall@gmail.com
2020-01-23 21:31:18 +11:00
Oliver O'Halloran
b1268f4cdb powerpc/eeh_cache: Don't use pci_dn when inserting new ranges
At the point where we start inserting ranges into the EEH address cache the
binding between pci_dev and eeh_dev has already been set up. Instead of
consulting the pci_dn tree we can retrieve the eeh_dev directly using
pci_dev_to_eeh_dev().

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Reviewed-by: Sam Bobroff <sbobroff@linux.ibm.com>
Tested-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190715085612.8802-2-oohall@gmail.com
2020-01-23 21:31:18 +11:00
Christophe Leroy
bfc2eae0ad powerpc/vdso32: miscellaneous optimisations
Various optimisations by inverting branches and removing
redundant instructions.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b4e79f963845545bcce1459cd6fcfe46bdde7863.1575273217.git.christophe.leroy@c-s.fr
2020-01-23 21:31:16 +11:00
Christophe Leroy
e33ffc956b powerpc/vdso32: implement clock_getres entirely
clock_getres returns hrtimer_res for all clocks but coarse ones
for which it returns KTIME_LOW_RES.

return EINVAL for unknown clocks.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/37f94e47c91070b7606fb3ec3fe6fd2302a475a0.1575273217.git.christophe.leroy@c-s.fr
2020-01-23 21:31:15 +11:00
Christophe Leroy
6e2f9e9cfd powerpc/vdso32: use LOAD_REG_IMMEDIATE()
Use LOAD_REG_IMMEDIATE() to load registers with immediate value.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/36f111437e66e601929308f5d5dce230e1ce472f.1575273217.git.christophe.leroy@c-s.fr
2020-01-23 21:31:15 +11:00
Christophe Leroy
2c29eef9fc powerpc/vdso32: Don't read cache line size from the datapage on PPC32.
On PPC32, the cache lines have a fixed size known at build time.

Don't read it from the datapage.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/dfa7b35e27e01964fcda84bf1ed8b2b31cf93826.1575273217.git.christophe.leroy@c-s.fr
2020-01-23 21:31:15 +11:00
Christophe Leroy
ec0895f08f powerpc/vdso32: inline __get_datapage()
__get_datapage() is only a few instructions to retrieve the
address of the page where the kernel stores data to the VDSO.

By inlining this function into its users, a bl/blr pair and
a mflr/mtlr pair is avoided, plus a few reg moves.

The improvement is noticeable (about 55 nsec/call on an 8xx)

vdsotest before the patch:
gettimeofday:    vdso: 731 nsec/call
clock-gettime-realtime-coarse:    vdso: 668 nsec/call
clock-gettime-monotonic-coarse:    vdso: 745 nsec/call

vdsotest after the patch:
gettimeofday:    vdso: 677 nsec/call
clock-gettime-realtime-coarse:    vdso: 613 nsec/call
clock-gettime-monotonic-coarse:    vdso: 690 nsec/call

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c39ef7f3dfa25356b01e211d539671f279086c09.1575273217.git.christophe.leroy@c-s.fr
2020-01-23 21:31:15 +11:00
Christophe Leroy
654abc69ef powerpc/vdso32: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE
This is copied and adapted from commit 5c929885f1 ("powerpc/vdso64:
Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
from Santosh Sivaraj <santosh@fossix.org>

Benchmark from vdsotest-all:
clock-gettime-realtime: syscall: 3601 nsec/call
clock-gettime-realtime:    libc: 1072 nsec/call
clock-gettime-realtime:    vdso: 931 nsec/call
clock-gettime-monotonic: syscall: 4034 nsec/call
clock-gettime-monotonic:    libc: 1213 nsec/call
clock-gettime-monotonic:    vdso: 1076 nsec/call
clock-gettime-realtime-coarse: syscall: 2722 nsec/call
clock-gettime-realtime-coarse:    libc: 805 nsec/call
clock-gettime-realtime-coarse:    vdso: 668 nsec/call
clock-gettime-monotonic-coarse: syscall: 2949 nsec/call
clock-gettime-monotonic-coarse:    libc: 882 nsec/call
clock-gettime-monotonic-coarse:    vdso: 745 nsec/call

Additional test passed with:
	vdsotest -d 30 clock-gettime-monotonic-coarse verify

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/linuxppc/issues/issues/41
Link: https://lore.kernel.org/r/d1d24a376e396540194eeb85a2efe481e92ade24.1575273217.git.christophe.leroy@c-s.fr
2020-01-23 21:31:15 +11:00
Christophe Leroy
902137ba8e powerpc/32: Add VDSO version of getcpu on non SMP
Commit 18ad51dd34 ("powerpc: Add VDSO version of getcpu") added
getcpu() for PPC64 only, by making use of a user readable general
purpose SPR.

PPC32 doesn't have any such SPR.

For non SMP, just return CPU id 0 from the VDSO directly.
PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0.

Before the patch, vdsotest reported:
getcpu: syscall: 1572 nsec/call
getcpu:    libc: 1787 nsec/call
getcpu:    vdso: not tested

Now, vdsotest reports:
getcpu: syscall: 1582 nsec/call
getcpu:    libc: 502 nsec/call
getcpu:    vdso: 187 nsec/call

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eaac4b6494ecff1811220fccc895bf282aab884a.1575273217.git.christophe.leroy@c-s.fr
2020-01-23 21:31:14 +11:00
Christophe Leroy
39413ae009 powerpc/hw_breakpoints: Rewrite 8xx breakpoints to allow any address range size.
Unlike standard powerpc, Powerpc 8xx doesn't have SPRN_DABR, but
it has a breakpoint support based on a set of comparators which
allow more flexibility.

Commit 4ad8622dc5 ("powerpc/8xx: Implement hw_breakpoint")
implemented breakpoints by emulating the DABR behaviour. It did
this by setting one comparator the match 4 bytes at breakpoint address
and the other comparator to match 4 bytes at breakpoint address + 4.

Rewrite 8xx hw_breakpoint to make breakpoints match all addresses
defined by the breakpoint address and length by making full use of
comparators.

Now, comparator E is set to match any address greater than breakpoint
address minus one. Comparator F is set to match any address lower than
breakpoint address plus breakpoint length. Addresses are aligned
to 32 bits.

When the breakpoint range starts at address 0, the breakpoint is set
to match comparator F only. When the breakpoint range end at address
0xffffffff, the breakpoint is set to match comparator E only.
Otherwise the breakpoint is set to match comparator E and F.

At the same time, use registers bit names instead of hardcoded values.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/05105deeaf63bc02151aea2cdeaf525534e0e9d4.1574790198.git.christophe.leroy@c-s.fr
2020-01-23 21:31:14 +11:00
Aleksa Sarai
fddb5d430a open: introduce openat2(2) syscall
/* Background. */
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].

This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).

Userspace also has a hard time figuring out whether a particular flag is
supported on a particular kernel. While it is now possible with
contemporary kernels (thanks to [3]), older kernels will expose unknown
flag bits through fcntl(F_GETFL). Giving a clear -EINVAL during
openat(2) time matches modern syscall designs and is far more
fool-proof.

In addition, the newly-added path resolution restriction LOOKUP flags
(which we would like to expose to user-space) don't feel related to the
pre-existing O_* flag set -- they affect all components of path lookup.
We'd therefore like to add a new flag argument.

Adding a new syscall allows us to finally fix the flag-ignoring problem,
and we can make it extensible enough so that we will hopefully never
need an openat3(2).

/* Syscall Prototype. */
  /*
   * open_how is an extensible structure (similar in interface to
   * clone3(2) or sched_setattr(2)). The size parameter must be set to
   * sizeof(struct open_how), to allow for future extensions. All future
   * extensions will be appended to open_how, with their zero value
   * acting as a no-op default.
   */
  struct open_how { /* ... */ };

  int openat2(int dfd, const char *pathname,
              struct open_how *how, size_t size);

/* Description. */
The initial version of 'struct open_how' contains the following fields:

  flags
    Used to specify openat(2)-style flags. However, any unknown flag
    bits or otherwise incorrect flag combinations (like O_PATH|O_RDWR)
    will result in -EINVAL. In addition, this field is 64-bits wide to
    allow for more O_ flags than currently permitted with openat(2).

  mode
    The file mode for O_CREAT or O_TMPFILE.

    Must be set to zero if flags does not contain O_CREAT or O_TMPFILE.

  resolve
    Restrict path resolution (in contrast to O_* flags they affect all
    path components). The current set of flags are as follows (at the
    moment, all of the RESOLVE_ flags are implemented as just passing
    the corresponding LOOKUP_ flag).

    RESOLVE_NO_XDEV       => LOOKUP_NO_XDEV
    RESOLVE_NO_SYMLINKS   => LOOKUP_NO_SYMLINKS
    RESOLVE_NO_MAGICLINKS => LOOKUP_NO_MAGICLINKS
    RESOLVE_BENEATH       => LOOKUP_BENEATH
    RESOLVE_IN_ROOT       => LOOKUP_IN_ROOT

open_how does not contain an embedded size field, because it is of
little benefit (userspace can figure out the kernel open_how size at
runtime fairly easily without it). It also only contains u64s (even
though ->mode arguably should be a u16) to avoid having padding fields
which are never used in the future.

Note that as a result of the new how->flags handling, O_PATH|O_TMPFILE
is no longer permitted for openat(2). As far as I can tell, this has
always been a bug and appears to not be used by userspace (and I've not
seen any problems on my machines by disallowing it). If it turns out
this breaks something, we can special-case it and only permit it for
openat(2) but not openat2(2).

After input from Florian Weimer, the new open_how and flag definitions
are inside a separate header from uapi/linux/fcntl.h, to avoid problems
that glibc has with importing that header.

/* Testing. */
In a follow-up patch there are over 200 selftests which ensure that this
syscall has the correct semantics and will correctly handle several
attack scenarios.

In addition, I've written a userspace library[4] which provides
convenient wrappers around openat2(RESOLVE_IN_ROOT) (this is necessary
because no other syscalls support RESOLVE_IN_ROOT, and thus lots of care
must be taken when using RESOLVE_IN_ROOT'd file descriptors with other
syscalls). During the development of this patch, I've run numerous
verification tests using libpathrs (showing that the API is reasonably
usable by userspace).

/* Future Work. */
Additional RESOLVE_ flags have been suggested during the review period.
These can be easily implemented separately (such as blocking auto-mount
during resolution).

Furthermore, there are some other proposed changes to the openat(2)
interface (the most obvious example is magic-link hardening[5]) which
would be a good opportunity to add a way for userspace to restrict how
O_PATH file descriptors can be re-opened.

Another possible avenue of future work would be some kind of
CHECK_FIELDS[6] flag which causes the kernel to indicate to userspace
which openat2(2) flags and fields are supported by the current kernel
(to avoid userspace having to go through several guesses to figure it
out).

[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVjZU_6Q@mail.gmail.com
[3]: commit 629e014bb8 ("fs: completely ignore unknown open flags")
[4]: https://sourceware.org/bugzilla/show_bug.cgi?id=17523
[5]: https://lore.kernel.org/lkml/20190930183316.10190-2-cyphar@cyphar.com/
[6]: https://youtu.be/ggD-eb3yPVs

Suggested-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2020-01-18 09:19:18 -05:00
Nicholas Piggin
ed0bc98f8c powerpc/64s: Reimplement power4_idle code in C
This implements the tricky tracing and soft irq handling bits in C,
leaving the low level bit to asm.

A functional difference is that this redirects the interrupt exit to
a return stub to execute blr, rather than the lr address itself. This
is probably barely measurable on real hardware, but it keeps the link
stack balanced.

Tested with QEMU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Move power4_fixup_nap back into exceptions-64s.S]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711022404.18132-1-npiggin@gmail.com
2020-01-16 14:59:37 +10:00
Arvind Sankar
4c82266d15 arch/powerpc/setup: Drop dummy_con initialization
con_init in tty/vt.c will now set conswitchp to dummy_con if it's unset.
Drop it from arch setup code.

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
Link: https://lore.kernel.org/r/20191218214506.49252-18-nivedita@alum.mit.edu
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-14 15:29:17 +01:00
Sargun Dhillon
9a2cef09c8
arch: wire up pidfd_getfd syscall
This wires up the pidfd_getfd syscall for all architectures.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20200107175927.4558-4-sargun@sargun.me
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2020-01-13 21:49:47 +01:00
Oliver O'Halloran
1c7f4fe86f powerpc/pci: Remove pcibios_setup_bus_devices()
With the previous patch applied pcibios_setup_device() will always be run
when pcibios_bus_add_device() is called. There are several code paths where
pcibios_setup_bus_device() is still called (the PowerPC specific PCI
hotplug support is one) so with just the previous patch applied the setup
can be run multiple times on a device, once before the device is added
to the bus and once after.

There's no need to run the setup in the early case any more so just
remove it entirely.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191028085424.12006-3-oohall@gmail.com
2020-01-06 16:25:29 +11:00
Shawn Anastasio
30d87ef8b3 powerpc/pci: Fix pcibios_setup_device() ordering
Move PCI device setup from pcibios_add_device() and pcibios_fixup_bus() to
pcibios_bus_add_device(). This ensures that platform-specific DMA and IOMMU
setup occurs after the device has been registered in sysfs, which is a
requirement for IOMMU group assignment to work

This fixes IOMMU group assignment for hotplugged devices on pseries, where
the existing behavior results in IOMMU assignment before registration.

Thanks to Lukas Wunner <lukas@wunner.de> for the suggestion.

Signed-off-by: Shawn Anastasio <shawn@anastas.io>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191028085424.12006-2-oohall@gmail.com
2020-01-06 16:25:28 +11:00
Oliver O'Halloran
3b5b9997b3 powerpc/powernv/iov: Ensure the pdn for VFs always contains a valid PE number
On pseries there is a bug with adding hotplugged devices to an IOMMU
group. For a number of dumb reasons fixing that bug first requires
re-working how VFs are configured on PowerNV. For background, on
PowerNV we use the pcibios_sriov_enable() hook to do two things:

  1. Create a pci_dn structure for each of the VFs, and
  2. Configure the PHB's internal BARs so the MMIO range for each VF
     maps to a unique PE.

Roughly speaking a PE is the hardware counterpart to a Linux IOMMU
group since all the devices in a PE share the same IOMMU table. A PE
also defines the set of devices that should be isolated in response to
a PCI error (i.e. bad DMA, UR/CA, AER events, etc). When isolated all
MMIO and DMA traffic to and from devicein the PE is blocked by the
root complex until the PE is recovered by the OS.

The requirement to block MMIO causes a giant headache because the P8
PHB generally uses a fixed mapping between MMIO addresses and PEs. As
a result we need to delay configuring the IOMMU groups for device
until after MMIO resources are assigned. For physical devices (i.e.
non-VFs) the PE assignment is done in pcibios_setup_bridge() which is
called immediately after the MMIO resources for downstream
devices (and the bridge's windows) are assigned. For VFs the setup is
more complicated because:

  a) pcibios_setup_bridge() is not called again when VFs are activated, and
  b) The pci_dev for VFs are created by generic code which runs after
     pcibios_sriov_enable() is called.

The work around for this is a two step process:

  1. A fixup in pcibios_add_device() is used to initialised the cached
     pe_number in pci_dn, then
  2. A bus notifier then adds the device to the IOMMU group for the PE
     specified in pci_dn->pe_number.

A side effect fixing the pseries bug mentioned in the first paragraph
is moving the fixup out of pcibios_add_device() and into
pcibios_bus_add_device(), which is called much later. This results in
step 2. failing because pci_dn->pe_number won't be initialised when
the bus notifier is run.

We can fix this by removing the need for the fixup. The PE for a VF is
known before the VF is even scanned so we can initialise
pci_dn->pe_number pcibios_sriov_enable() instead. Unfortunately,
moving the initialisation causes two problems:

  1. We trip the WARN_ON() in the current fixup code, and
  2. The EEH core clears pdn->pe_number when recovering a VF and
     relies on the fixup to correctly re-set it.

The only justification for either of these is a comment in
eeh_rmv_device() suggesting that pdn->pe_number *must* be set to
IODA_INVALID_PE in order for the VF to be scanned. However, this
comment appears to have no basis in reality. Both bugs can be fixed by
just deleting the code.

Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191028085424.12006-1-oohall@gmail.com
2020-01-06 16:25:28 +11:00
Jordan Niethe
5290ae2b8e powerpc/64: Use {SAVE,REST}_NVGPRS macros
In entry_64.S there are places that open code saving and restoring the
non-volatile registers. There are already macros for doing this so use
them.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191211023552.16480-1-jniethe5@gmail.com
2020-01-06 16:25:26 +11:00
Ingo Molnar
1e5f8a3085 Linux 5.5-rc3
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl4AEiYeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGR3sH/ixrBBYUVyjRPOxS
 ce4iVoTqphGSoAzq/3FA1YZZOPQ/Ep0NXL4L2fTGxmoiqIiuy8JPp07/NKbHQjj1
 Rt6PGm6cw2pMJHaK9gRdlTH/6OyXkp06OkH1uHqKYrhPnpCWDnj+i2SHAX21Hr1y
 oBQh4/XKvoCMCV96J2zxRsLvw8OkQFE0ouWWfj6LbpXIsmWZ++s0OuaO1cVdP/oG
 j+j2Voi3B3vZNQtGgJa5W7YoZN5Qk4ZIj9bMPg7bmKRd3wNB228AiJH2w68JWD/I
 jCA+JcITilxC9ud96uJ6k7SMS2ufjQlnP0z6Lzd0El1yGtHYRcPOZBgfOoPU2Euf
 33WGSyI=
 =iEwx
 -----END PGP SIGNATURE-----

Merge tag 'v5.5-rc3' into sched/core, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-25 10:41:37 +01:00
Greg Kroah-Hartman
749e4121d6 Merge 5.5-rc3 into tty-next
We need the tty/serial fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-23 06:59:19 -05:00
Dmitry Safonov
d68fefdd5b tty/serial: Migrate 8250_fsl to use has_sysrq
The SUPPORT_SYSRQ ifdeffery is not nice as:
- May create misunderstanding about sizeof(struct uart_port) between
  different objects
- Prevents moving functions from serial_core.h
- Reduces readability (well, it's ifdeffery - it's hard to follow)

In order to remove SUPPORT_SYSRQ, has_sysrq variable has been added.
Initialise it in driver's probe and remove ifdeffery.

In contrast to 8250/8250_of, legacy_serial on powerpc does fill
(struct plat_serial8250_port). The reason is likely that it's done on
device_initcall(), not on probe. So, 8250_core is not yet probed.

Propagate value from platform_device on 8250 probe - in case powepc
legacy driver it's initialized on initcall, in case 8250_of it will be
initialized later on of_platform_serial_setup().

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Dmitry Safonov <dima@arista.com>
Link: https://lore.kernel.org/r/20191213000657.931618-6-dima@arista.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-12-18 15:04:42 +01:00
Christophe Leroy
099bc4812f powerpc/irq: fix stack overflow verification
Before commit 0366a1c70b ("powerpc/irq: Run softirqs off the top of
the irq stack"), check_stack_overflow() was called by do_IRQ(), before
switching to the irq stack.
In that commit, do_IRQ() was renamed __do_irq(), and is now executing
on the irq stack, so check_stack_overflow() has just become almost
useless.

Move check_stack_overflow() call in do_IRQ() to do the check while
still on the current stack.

Fixes: 0366a1c70b ("powerpc/irq: Run softirqs off the top of the irq stack")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e033aa8116ab12b7ca9a9c75189ad0741e3b9b5f.1575872340.git.christophe.leroy@c-s.fr
2019-12-14 07:59:36 +11:00
Thomas Gleixner
fdc5569eab sched/rt, powerpc: Use CONFIG_PREEMPTION
CONFIG_PREEMPTION is selected by CONFIG_PREEMPT and by CONFIG_PREEMPT_RT.
Both PREEMPT and PREEMPT_RT require the same functionality which today
depends on CONFIG_PREEMPT.

Switch the entry code over to use CONFIG_PREEMPTION.

[bigeasy: +Kconfig]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20191024160458.vlnf3wlcyjl2ich7@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-08 14:37:32 +01:00
Vincenzo Frascino
5522634562 powerpc: Fix vDSO clock_getres()
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().

In particular, posix_get_hrtimer_res() does:
    sec = 0;
    ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.

Fix the powerpc vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.

Fixes: a7f290dad3 ("[PATCH] powerpc: Merge vdso's and add vdso support to 32 bits kernel")
Cc: stable@vger.kernel.org
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
[chleroy: changed CLOCK_REALTIME_RES to CLOCK_HRTIMER_RES]
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a55eca3a5e85233838c2349783bcb5164dae1d09.1575273217.git.christophe.leroy@c-s.fr
2019-12-05 00:13:55 +11:00
Christophe Leroy
71eb40fc53 powerpc/kasan: Fix boot failure with RELOCATABLE && FSL_BOOKE
When enabling CONFIG_RELOCATABLE and CONFIG_KASAN on FSL_BOOKE,
the kernel doesn't boot.

relocate_init() requires KASAN early shadow area to be set up because
it needs access to the device tree through generic functions.

Call kasan_early_init() before calling relocate_init()

Reported-by: Lexi Shao <shaolexi@huawei.com>
Fixes: 2edb16efc8 ("powerpc/32: Add KASAN support")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b58426f1664a4b344ff696d18cacf3b3e8962111.1575036985.git.christophe.leroy@c-s.fr
2019-12-05 00:11:43 +11:00
Linus Torvalds
ceb3074745 y2038: syscall implementation cleanups
This is a series of cleanups for the y2038 work, mostly intended
 for namespace cleaning: the kernel defines the traditional
 time_t, timeval and timespec types that often lead to y2038-unsafe
 code. Even though the unsafe usage is mostly gone from the kernel,
 having the types and associated functions around means that we
 can still grow new users, and that we may be missing conversions
 to safe types that actually matter.
 
 There are still a number of driver specific patches needed to
 get the last users of these types removed, those have been
 submitted to the respective maintainers.
 
 Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/
 Signed-off-by: Arnd Bergmann <arnd@arndb.de>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJd3D+wAAoJEJpsee/mABjZfdcQAJvl6e+4ddKoDMIVJqVCE25N
 meFRgA7S8jy6BefEVeUgI8TxK+amGO36szMBUEnZxSSxq9u+gd13m5bEK6Xq/ov7
 4KTAiA3Irm/W5FBTktu1zc5ROIra1Xj7jLdubf8wEC3viSXIXB3+68Y28iBN7D2O
 k9kSpwINC5lWeC8guZy2I+2yc4ywUEXao9nVh8C/J+FQtU02TcdLtZop9OhpAa8u
 U19VVH3WHkQI7ZfLvBTUiYK6tlYTiYCnpr8l6sm850CnVv1fzBW+DzmVhPJ6FdFd
 4m5staC0sQ6gVqtjVMBOtT5CdzREse6hpwbKo2GRWFroO5W9tljMOJJXHvv/f6kz
 DxrpUmj37JuRbqAbr8KDmQqPo6M2CRkxFxjol1yh5ER63u1xMwLm/PQITZIMDvPO
 jrFc2C2SdM2E9bKP/RMCVoKSoRwxCJ5IwJ2AF237rrU0sx/zB2xsrOGssx5CWEgc
 3bbk6tDQujJJubnCfgRy1tTxpLZOHEEKw8YhFLLbR2LCtA9pA/0rfLLad16cjA5e
 5jIHxfsFc23zgpzrJeB7kAF/9xgu1tlA5BotOs3VBE89LtWOA9nK5dbPXng6qlUe
 er3xLCfS38ovhUw6DusQpaYLuaYuLM7DKO4iav9kuTMcY9GkbPk7vDD3KPGh2goy
 hY5cSM8+kT1q/THLnUBH
 =Bdbv
 -----END PGP SIGNATURE-----

Merge tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground

Pull y2038 cleanups from Arnd Bergmann:
 "y2038 syscall implementation cleanups

  This is a series of cleanups for the y2038 work, mostly intended for
  namespace cleaning: the kernel defines the traditional time_t, timeval
  and timespec types that often lead to y2038-unsafe code. Even though
  the unsafe usage is mostly gone from the kernel, having the types and
  associated functions around means that we can still grow new users,
  and that we may be missing conversions to safe types that actually
  matter.

  There are still a number of driver specific patches needed to get the
  last users of these types removed, those have been submitted to the
  respective maintainers"

Link: https://lore.kernel.org/lkml/20191108210236.1296047-1-arnd@arndb.de/

* tag 'y2038-cleanups-5.5' of git://git.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: (26 commits)
  y2038: alarm: fix half-second cut-off
  y2038: ipc: fix x32 ABI breakage
  y2038: fix typo in powerpc vdso "LOPART"
  y2038: allow disabling time32 system calls
  y2038: itimer: change implementation to timespec64
  y2038: move itimer reset into itimer.c
  y2038: use compat_{get,set}_itimer on alpha
  y2038: itimer: compat handling to itimer.c
  y2038: time: avoid timespec usage in settimeofday()
  y2038: timerfd: Use timespec64 internally
  y2038: elfcore: Use __kernel_old_timeval for process times
  y2038: make ns_to_compat_timeval use __kernel_old_timeval
  y2038: socket: use __kernel_old_timespec instead of timespec
  y2038: socket: remove timespec reference in timestamping
  y2038: syscalls: change remaining timeval to __kernel_old_timeval
  y2038: rusage: use __kernel_old_timeval
  y2038: uapi: change __kernel_time_t to __kernel_old_time_t
  y2038: stat: avoid 'time_t' in 'struct stat'
  y2038: ipc: remove __kernel_time_t reference from headers
  y2038: vdso: powerpc: avoid timespec references
  ...
2019-12-01 14:00:59 -08:00
Linus Torvalds
7794b1d418 powerpc updates for 5.5
Highlights:
 
  - Infrastructure for secure boot on some bare metal Power9 machines. The
    firmware support is still in development, so the code here won't actually
    activate secure boot on any existing systems.
 
  - A change to xmon (our crash handler / pseudo-debugger) to restrict it to
    read-only mode when the kernel is lockdown'ed, otherwise it's trivial to drop
    into xmon and modify kernel data, such as the lockdown state.
 
  - Support for KASLR on 32-bit BookE machines (Freescale / NXP).
 
  - Fixes for our flush_icache_range() and __kernel_sync_dicache() (VDSO) to work
    with memory ranges >4GB.
 
  - Some reworks of the pseries CMM (Cooperative Memory Management) driver to
    make it behave more like other balloon drivers and enable some cleanups of
    generic mm code.
 
  - A series of fixes to our hardware breakpoint support to properly handle
    unaligned watchpoint addresses.
 
 Plus a bunch of other smaller improvements, fixes and cleanups.
 
 Thanks to:
   Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V, Anthony Steinhauser,
   Cédric Le Goater, Chris Packham, Chris Smart, Christophe Leroy, Christopher M.
   Riedl, Christoph Hellwig, Claudio Carvalho, Daniel Axtens, David Hildenbrand,
   Deb McLemore, Diana Craciun, Eric Richter, Geert Uytterhoeven, Greg
   Kroah-Hartman, Greg Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason
   Yan, Krzysztof Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M.
   Rodrigues, Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
   Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes, Ravi
   Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth, Tyrel
   Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3hBycTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgApBEACk91MEQDYJ9MF9I6uN+85qb5p4pMsp
 rGzqnpt+XFidbDAc3eP63pYfIDSo3jtkQ2YL7shAnDOTvkO0md+Vqkl9Aq/G6FIf
 lDBlwbgkXMSxS/O2Lpvfn4NZAoK6dKmiV55LSgfliM62X3e2Saeg6TR55wBTgJ6/
 SlYPDwZfcVHOAiFS3UmfB+hkiIZk+AI5Zr5VAZvT2ZmeH36yAWkq4JgJI1uAk6m1
 /7iCnlfUjx/nl/BhnA3kjjmAgGCJ5s/WuVgwFMz47XpMBWGBhLWpMh/NqDTFb8ca
 kpiVQoVPQe2xyO3pL/kOwBy6sii26ftfHDhLKMy1hJdEhVQzS5LerPIMeh1qsU8Q
 hV/Cj+jfsrS/vBDOehj3jwx93t+861PmTOqgLnpYQ6Ltrt+2B/74+fufGMHE1kI3
 Ffo7xvNw4sw6bSziDxDFqUx2P1dFN5D5EJsJsYM98ekkVAAkzNqCDRvfD2QI8Pif
 VXWPYXqtNJTrVPJA0D7Yfo9FDNwhANd0f1zi7r/U5mVXBFUyKOlGqTQSkXgMrVeK
 3I7wHPOVGgdA5UUkfcd3pcuqsY081U9E//o5PUfj8ybO5JCwly8NoatbG+xHmKia
 a72uJT8MjCo9mGCHKDrwi9l/kqms6ZSv8RP+yMhGuB52YoiGc6PpVyab5jXIUd1N
 yTtBlC0YGW1JYw==
 =JHzg
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:
 "Highlights:

   - Infrastructure for secure boot on some bare metal Power9 machines.
     The firmware support is still in development, so the code here
     won't actually activate secure boot on any existing systems.

   - A change to xmon (our crash handler / pseudo-debugger) to restrict
     it to read-only mode when the kernel is lockdown'ed, otherwise it's
     trivial to drop into xmon and modify kernel data, such as the
     lockdown state.

   - Support for KASLR on 32-bit BookE machines (Freescale / NXP).

   - Fixes for our flush_icache_range() and __kernel_sync_dicache()
     (VDSO) to work with memory ranges >4GB.

   - Some reworks of the pseries CMM (Cooperative Memory Management)
     driver to make it behave more like other balloon drivers and enable
     some cleanups of generic mm code.

   - A series of fixes to our hardware breakpoint support to properly
     handle unaligned watchpoint addresses.

  Plus a bunch of other smaller improvements, fixes and cleanups.

  Thanks to: Alastair D'Silva, Andrew Donnellan, Aneesh Kumar K.V,
  Anthony Steinhauser, Cédric Le Goater, Chris Packham, Chris Smart,
  Christophe Leroy, Christopher M. Riedl, Christoph Hellwig, Claudio
  Carvalho, Daniel Axtens, David Hildenbrand, Deb McLemore, Diana
  Craciun, Eric Richter, Geert Uytterhoeven, Greg Kroah-Hartman, Greg
  Kurz, Gustavo L. F. Walbon, Hari Bathini, Harish, Jason Yan, Krzysztof
  Kozlowski, Leonardo Bras, Mathieu Malaterre, Mauro S. M. Rodrigues,
  Michal Suchanek, Mimi Zohar, Nathan Chancellor, Nathan Lynch, Nayna
  Jain, Nick Desaulniers, Oliver O'Halloran, Qian Cai, Rasmus Villemoes,
  Ravi Bangoria, Sam Bobroff, Santosh Sivaraj, Scott Wood, Thomas Huth,
  Tyrel Datwyler, Vaibhav Jain, Valentin Longchamp, YueHaibing"

* tag 'powerpc-5.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (144 commits)
  powerpc/fixmap: fix crash with HIGHMEM
  x86/efi: remove unused variables
  powerpc: Define arch_is_kernel_initmem_freed() for lockdep
  powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
  powerpc: Avoid clang warnings around setjmp and longjmp
  powerpc: Don't add -mabi= flags when building with Clang
  powerpc: Fix Kconfig indentation
  powerpc/fixmap: don't clear fixmap area in paging_init()
  selftests/powerpc: spectre_v2 test must be built 64-bit
  powerpc/powernv: Disable native PCIe port management
  powerpc/kexec: Move kexec files into a dedicated subdir.
  powerpc/32: Split kexec low level code out of misc_32.S
  powerpc/sysdev: drop simple gpio
  powerpc/83xx: map IMMR with a BAT.
  powerpc/32s: automatically allocate BAT in setbat()
  powerpc/ioremap: warn on early use of ioremap()
  powerpc: Add support for GENERIC_EARLY_IOREMAP
  powerpc/fixmap: Use __fix_to_virt() instead of fix_to_virt()
  powerpc/8xx: use the fixmapped IMMR in cpm_reset()
  powerpc/8xx: add __init to cpm1 init functions
  ...
2019-11-30 14:35:43 -08:00
Linus Torvalds
80eb5fea3c powerpc fixes for Spectre-RSB
We failed to activate the mitigation for Spectre-RSB (Return Stack
 Buffer, aka. ret2spec) on context switch, on CPUs prior to Power9
 DD2.3.
 
 That allows a process to poison the RSB (called Link Stack on Power
 CPUs) and possibly misdirect speculative execution of another process.
 If the victim process can be induced to execute a leak gadget then it
 may be possible to extract information from the victim via a side
 channel.
 
 The fix is to correctly activate the link stack flush mitigation on
 all CPUs that have any mitigation of Spectre v2 in userspace enabled.
 
 There's a second commit which adds a link stack flush in the KVM guest
 exit path. A leak via that path has not been demonstrated, but we
 believe it's at least theoretically possible.
 
 This is the fix for CVE-2019-18660.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl3eUXsTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEXtD/4qiCp4OHo+MEFbDyqZHZSYdFihpZ2B
 9s8yQKMaL0WWVJU0rlKSY0fDW/W0pLUn1zoREY9kRIHrQQi9wd5kg6s2kZtDeIPZ
 XPANeOpicMJjKGA+s/CqJfJZmGhzQ6VYplg/qevjvgOZqn8QsQhljg85w3Tr2wjo
 oXyi/0ZNv957pYrTHu08YIRr5OxalcE6Cxb4hZBqwbcubwKANSifLb72hcDEkNdR
 wshmt6mZUMtW8ToaGGt2b0csF2I0TClvBLQV8bxlbMZNFPYgUfBaHyCtHnv6bX65
 Jlgyw46pv9o0aeIF24rmP9jDEX+Hcig5Qu/EdLkd9lDl5YMxxVv9LlGq8tt7TQjI
 J97DeUFYjvePGMzirPFc7EvEoN35f19/5IuZUEQQ8wE4I/R1gNqOxbpGUqvReTbA
 +WJHqqT6sbJ1ys/mWRlGYMkn1xPNG3scpTNNh9/f3f/+ci3knOeYNeieVjjFvIv0
 4+toMQGIU7gB0mU67oLyClygOvC0DeBSQk8nFk0pznzUqdQlsqnbbI5O0KWFjf57
 jZV9l5khfdkkZiTIkGEZ3RY7X4pcrKd4kI+5+2OLiZOQVw2rudE3+ocysxP0osmA
 Ec+dr3uxL1YKdR4GVvk5mCMNlln6PpfT5Y21YSiipnGsWC1hvYkbPaj4u/T5oV5T
 B1bP/R7gQw4yfQ==
 =NQFt
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-spectre-rsb' of powerpc-CVE-2019-18660.bundle

Pull powerpc Spectre-RSB fixes from Michael Ellerman:
 "We failed to activate the mitigation for Spectre-RSB (Return Stack
  Buffer, aka. ret2spec) on context switch, on CPUs prior to Power9
  DD2.3.

  That allows a process to poison the RSB (called Link Stack on Power
  CPUs) and possibly misdirect speculative execution of another process.
  If the victim process can be induced to execute a leak gadget then it
  may be possible to extract information from the victim via a side
  channel.

  The fix is to correctly activate the link stack flush mitigation on
  all CPUs that have any mitigation of Spectre v2 in userspace enabled.

  There's a second commit which adds a link stack flush in the KVM guest
  exit path. A leak via that path has not been demonstrated, but we
  believe it's at least theoretically possible.

  This is the fix for CVE-2019-18660"

* tag 'powerpc-spectre-rsb' of /home/torvalds/Downloads/powerpc-CVE-2019-18660.bundle:
  KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel
  powerpc/book3s64: Fix link stack flush on context switch
2019-11-27 11:25:04 -08:00
Linus Torvalds
77a05940ee Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The biggest changes in this cycle were:

   - Make kcpustat vtime aware (Frederic Weisbecker)

   - Rework the CFS load_balance() logic (Vincent Guittot)

   - Misc cleanups, smaller enhancements, fixes.

  The load-balancing rework is the most intrusive change: it replaces
  the old heuristics that have become less meaningful after the
  introduction of the PELT metrics, with a grounds-up load-balancing
  algorithm.

  As such it's not really an iterative series, but replaces the old
  load-balancing logic with the new one. We hope there are no
  performance regressions left - but statistically it's highly probable
  that there *is* going to be some workload that is hurting from these
  chnages. If so then we'd prefer to have a look at that workload and
  fix its scheduling, instead of reverting the changes"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
  rackmeter: Use vtime aware kcpustat accessor
  leds: Use all-in-one vtime aware kcpustat accessor
  cpufreq: Use vtime aware kcpustat accessors for user time
  procfs: Use all-in-one vtime aware kcpustat accessor
  sched/vtime: Bring up complete kcpustat accessor
  sched/cputime: Support other fields on kcpustat_field()
  sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util()
  sched/fair: Add comments for group_type and balancing at SD_NUMA level
  sched/fair: Fix rework of find_idlest_group()
  sched/uclamp: Fix overzealous type replacement
  sched/Kconfig: Fix spelling mistake in user-visible help text
  sched/core: Further clarify sched_class::set_next_task()
  sched/fair: Use mul_u32_u32()
  sched/core: Simplify sched_class::pick_next_task()
  sched/core: Optimize pick_next_task()
  sched/core: Make pick_next_task_idle() more consistent
  sched/fair: Better document newidle_balance()
  leds: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  cpufreq: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  procfs: Use vtime aware kcpustat accessor to fetch CPUTIME_SYSTEM
  ...
2019-11-26 15:23:14 -08:00
Linus Torvalds
1d87200446 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Cross-arch changes to move the linker sections for NOTES and
     EXCEPTION_TABLE into the RO_DATA area, where they belong on most
     architectures. (Kees Cook)

   - Switch the x86 linker fill byte from x90 (NOP) to 0xcc (INT3), to
     trap jumps into the middle of those padding areas instead of
     sliding execution. (Kees Cook)

   - A thorough cleanup of symbol definitions within x86 assembler code.
     The rather randomly named macros got streamlined around a
     (hopefully) straightforward naming scheme:

        SYM_START(name, linkage, align...)
        SYM_END(name, sym_type)

        SYM_FUNC_START(name)
        SYM_FUNC_END(name)

        SYM_CODE_START(name)
        SYM_CODE_END(name)

        SYM_DATA_START(name)
        SYM_DATA_END(name)

     etc - with about three times of these basic primitives with some
     label, local symbol or attribute variant, expressed via postfixes.

     No change in functionality intended. (Jiri Slaby)

   - Misc other changes, cleanups and smaller fixes"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
  x86/entry/64: Remove pointless jump in paranoid_exit
  x86/entry/32: Remove unused resume_userspace label
  x86/build/vdso: Remove meaningless CFLAGS_REMOVE_*.o
  m68k: Convert missed RODATA to RO_DATA
  x86/vmlinux: Use INT3 instead of NOP for linker fill bytes
  x86/mm: Report actual image regions in /proc/iomem
  x86/mm: Report which part of kernel image is freed
  x86/mm: Remove redundant address-of operators on addresses
  xtensa: Move EXCEPTION_TABLE to RO_DATA segment
  powerpc: Move EXCEPTION_TABLE to RO_DATA segment
  parisc: Move EXCEPTION_TABLE to RO_DATA segment
  microblaze: Move EXCEPTION_TABLE to RO_DATA segment
  ia64: Move EXCEPTION_TABLE to RO_DATA segment
  h8300: Move EXCEPTION_TABLE to RO_DATA segment
  c6x: Move EXCEPTION_TABLE to RO_DATA segment
  arm64: Move EXCEPTION_TABLE to RO_DATA segment
  alpha: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Move EXCEPTION_TABLE to RO_DATA segment
  x86/vmlinux: Actually use _etext for the end of the text segment
  vmlinux.lds.h: Allow EXCEPTION_TABLE to live in RO_DATA
  ...
2019-11-26 10:42:40 -08:00
Nathan Chancellor
8dcd71b45d powerpc/prom_init: Use -ffreestanding to avoid a reference to bcmp
LLVM revision r374662 gives LLVM the ability to convert certain loops
into a reference to bcmp as an optimization; this breaks
prom_init_check.sh:

    CALL    arch/powerpc/kernel/prom_init_check.sh
  Error: External symbol 'bcmp' referenced from prom_init.c
  make[2]: *** [arch/powerpc/kernel/Makefile:196: prom_init_check] Error 1

bcmp is defined in lib/string.c as a wrapper for memcmp so this could
be added to the whitelist. However, commit
450e7dd400 ("powerpc/prom_init: don't use string functions from
lib/") copied memcmp as prom_memcmp to avoid KASAN instrumentation so
having bcmp be resolved to regular memcmp would break that assumption.
Furthermore, because the compiler is the one that inserted bcmp, we
cannot provide something like prom_bcmp.

To prevent LLVM from being clever with optimizations like this, use
-ffreestanding to tell LLVM we are not hosted so it is not free to
make transformations like this.

Reviewed-by: Nick Desaulneris <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191119045712.39633-4-natechancellor@gmail.com
2019-11-25 21:45:43 +11:00
Arnd Bergmann
1c11ca7a05 y2038: fix typo in powerpc vdso "LOPART"
The earlier patch introduced a typo, change LOWPART back to
LOPART.

Fixes: 176ed98c8a ("y2038: vdso: powerpc: avoid timespec references")
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-21 15:19:49 +01:00
Christophe Leroy
793b08e2ef powerpc/kexec: Move kexec files into a dedicated subdir.
arch/powerpc/kernel/ contains 8 files dedicated to kexec.

Move them into a dedicated subdirectory.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Move to a/p/kexec, drop the 'machine' naming and use 'core' instead]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/afbef97ec6a978574a5cf91a4441000e0a9da42a.1572351221.git.christophe.leroy@c-s.fr
2019-11-21 15:41:34 +11:00
Christophe Leroy
9f7bd92015 powerpc/32: Split kexec low level code out of misc_32.S
Almost half of misc_32.S is dedicated to kexec.
That's the relocation function for kexec.

Drop it into a dedicated kexec_relocate_32.S

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e235973a1198195763afd3b6baffa548a83f4611.1572351221.git.christophe.leroy@c-s.fr
2019-11-21 15:41:34 +11:00
Christophe Leroy
265c3491c4 powerpc: Add support for GENERIC_EARLY_IOREMAP
Add support for GENERIC_EARLY_IOREMAP.

Let's define 16 slots of 256Kbytes each for early ioremap.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/412c7eaa6a373d8f82a3c3ee01e6a65a1a6589de.1568295907.git.christophe.leroy@c-s.fr
2019-11-19 19:38:38 +11:00
Michael Ellerman
98ba8e8013 Merge branch 'next' of https://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next
Merge changes from Scott:
  Includes a couple of device tree fixes, a spelling fix, and leftover
  code cleanup.
2019-11-18 22:26:59 +11:00
Geert Uytterhoeven
3a0990ca1a powerpc/booke: Spelling s/date/data/
Caching dates is never a good idea ;-)

Fixes: e7affb1dba ("powerpc/cache: add cache flush operation for various e500")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Scott Wood <oss@buserror.net>
2019-11-17 01:56:31 -06:00
Arnd Bergmann
75d319c06e y2038: syscalls: change remaining timeval to __kernel_old_timeval
All of the remaining syscalls that pass a timeval (gettimeofday, utime,
futimesat) can trivially be changed to pass a __kernel_old_timeval
instead, which has a compatible layout, but avoids ambiguity with
the timeval type in user space.

Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15 14:38:29 +01:00
Arnd Bergmann
176ed98c8a y2038: vdso: powerpc: avoid timespec references
As a preparation to stop using 'struct timespec' in the kernel,
change the powerpc vdso implementation:

- split up the vdso data definition to have equivalent members
   for seconds and nanoseconds instead of an xtime structure

- use timespec64 as an intermediate for the xtime update

- change the asm-offsets definition to be based the appropriate
  fixed-length types

This is only a temporary fix for changing the types, in order
to actually support a 64-bit safe vdso32 version of clock_gettime(),
the entire powerpc vdso should be replaced with the generic
lib/vdso/ implementation. If that happens first, this patch
becomes obsolete.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15 14:38:28 +01:00
Arnd Bergmann
ddccf40fe8 y2038: vdso: change timeval to __kernel_old_timeval
The gettimeofday() function in vdso uses the traditional 'timeval'
structure layout, which will be incompatible with future versions of
glibc on 32-bit architectures that use a 64-bit time_t.

This interface is problematic for y2038, when time_t overflows on 32-bit
architectures, but the plan so far is that a libc with 64-bit time_t
will not call into the gettimeofday() vdso helper at all, and only
have a method for entering clock_gettime().  This means we don't have
to fix it here, though we probably want to add a new clock_gettime()
entry point using a 64-bit version of 'struct timespec' at some point.

Changing the vdso code to use __kernel_old_timeval helps isolate
this usage from the other ones that still need to be fixed properly,
and it gets us closer to removing the 'timeval' definition from the
kernel sources.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2019-11-15 14:38:27 +01:00
Michael Ellerman
3df191118b Merge branch 'topic/kaslr-book3e32' into next
This is a slight rebase of Scott's next branch, which contained the
KASLR support for book3e 32-bit, to squash in a couple of small fixes.

See the	original pull request:
  https://lore.kernel.org/r/20191022232155.GA26174@home.buserror.net
2019-11-14 19:23:33 +11:00
Michael Ellerman
af2e8c68b9 KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel
On some systems that are vulnerable to Spectre v2, it is up to
software to flush the link stack (return address stack), in order to
protect against Spectre-RSB.

When exiting from a guest we do some house keeping and then
potentially exit to C code which is several stack frames deep in the
host kernel. We will then execute a series of returns without
preceeding calls, opening up the possiblity that the guest could have
poisoned the link stack, and direct speculative execution of the host
to a gadget of some sort.

To prevent this we add a flush of the link stack on exit from a guest.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-14 15:37:59 +11:00
Michael Ellerman
39e72bf96f powerpc/book3s64: Fix link stack flush on context switch
In commit ee13cb249f ("powerpc/64s: Add support for software count
cache flush"), I added support for software to flush the count
cache (indirect branch cache) on context switch if firmware told us
that was the required mitigation for Spectre v2.

As part of that code we also added a software flush of the link
stack (return address stack), which protects against Spectre-RSB
between user processes.

That is all correct for CPUs that activate that mitigation, which is
currently Power9 Nimbus DD2.3.

What I got wrong is that on older CPUs, where firmware has disabled
the count cache, we also need to flush the link stack on context
switch.

To fix it we create a new feature bit which is not set by firmware,
which tells us we need to flush the link stack. We set that when
firmware tells us that either of the existing Spectre v2 mitigations
are enabled.

Then we adjust the patching code so that if we see that feature bit we
enable the link stack flush. If we're also told to flush the count
cache in software then we fall through and do that also.

On the older CPUs we don't need to do do the software count cache
flush, firmware has disabled it, so in that case we patch in an early
return after the link stack flush.

The naming of some of the functions is awkward after this patch,
because they're called "count cache" but they also do link stack. But
we'll fix that up in a later commit to ease backporting.

This is the fix for CVE-2019-18660.

Reported-by: Anthony Steinhauser <asteinhauser@google.com>
Fixes: ee13cb249f ("powerpc/64s: Add support for software count cache flush")
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-14 15:37:52 +11:00
Jason Yan
74277f00b2 powerpc/fsl_booke/kaslr: export offset in VMCOREINFO ELF notes
Like all other architectures such as x86 or arm64, include KASLR offset
in VMCOREINFO ELF notes to assist in debugging. After this, we can use
crash --kaslr option to parse vmcore generated from a kaslr kernel.

Note: The crash tool needs to support --kaslr too.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-13 19:27:54 +11:00
Jason Yan
921a79b780 powerpc/fsl_booke/kaslr: dump out kernel offset information on panic
When kaslr is enabled, the kernel offset is different for every boot.
This brings some difficult to debug the kernel. Dump out the kernel
offset when panic so that we can easily debug the kernel.

This code is derived from x86/arm64 which has similar functionality.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Tested-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-13 19:27:51 +11:00
Jason Yan
2b0e86cc5d powerpc/fsl_booke/32: implement KASLR infrastructure
This patch add support to boot kernel from places other than KERNELBASE.
Since CONFIG_RELOCATABLE has already supported, what we need to do is
map or copy kernel to a proper place and relocate. Freescale Book-E
parts expect lowmem to be mapped by fixed TLB entries(TLB1). The TLB1
entries are not suitable to map the kernel directly in a randomized
region, so we chose to copy the kernel to a proper place and restart to
relocate.

The offset of the kernel was not randomized yet(a fixed 64M is set). We
will randomize it in the next patch.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Tested-by: Diana Craciun <diana.craciun@nxp.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Scott Wood <oss@buserror.net>
[mpe: Use PTRRELOC() in early_init()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-13 19:27:40 +11:00
Jason Yan
c061b38a3e powerpc/fsl_booke/32: introduce reloc_kernel_entry() helper
Add a new helper reloc_kernel_entry() to jump back to the start of the
new kernel. After we put the new kernel in a randomized place we can use
this new helper to enter the kernel and begin to relocate again.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Tested-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-13 19:27:37 +11:00
Jason Yan
aa1d2090e6 powerpc/fsl_booke/32: introduce create_kaslr_tlb_entry() helper
Add a new helper create_kaslr_tlb_entry() to create a tlb entry by the
virtual and physical address. This is a preparation to support boot kernel
at a randomized address.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Tested-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-13 19:27:34 +11:00
Jason Yan
8054df0570 powerpc: unify definition of M_IF_NEEDED
M_IF_NEEDED is defined too many times. Move it to a common place and
rename it to MAS2_M_IF_NEEDED which is much readable.

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Diana Craciun <diana.craciun@nxp.com>
Tested-by: Diana Craciun <diana.craciun@nxp.com>
Signed-off-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-11-13 19:27:24 +11:00
Michal Suchanek
565f9bc05e powerpc/fadump: when fadump is supported register the fadump sysfs files.
Currently it is not possible to distinguish the case when fadump is
supported by firmware and disabled in kernel and completely unsupported
using the kernel sysfs interface. User can investigate the devicetree
but it is more reasonable to provide sysfs files in case we get some
fadumpv2 in the future.

With this patch sysfs files are available whenever fadump is supported
by firmware.

There is duplicate message about lack of support by firmware in
fadump_reserve_mem and setup_fadump. Remove the duplicate message in
setup_fadump.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191107164757.15140-1-msuchanek@suse.de
2019-11-13 16:58:11 +11:00
Sam Bobroff
de84ffc3cc powerpc/eeh: differentiate duplicate detection message
Currently when an EEH error is detected, the system log receives the
same (or almost the same) message twice:

  EEH: PHB#0 failure detected, location: N/A
  EEH: PHB#0 failure detected, location: N/A
or
  EEH: eeh_dev_check_failure: Frozen PHB#0-PE#0 detected
  EEH: Frozen PHB#0-PE#0 detected

This looks like a bug, but in fact the messages are from different
functions and mean slightly different things.  So keep both but change
one of the messages slightly, so that it's clear they are different:

  EEH: PHB#0 failure detected, location: N/A
  EEH: Recovering PHB#0, location: N/A
or
  EEH: eeh_dev_check_failure: Frozen PHB#0-PE#0 detected
  EEH: Recovering PHB#0-PE#0

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/43817cb6e6631b0828b9a6e266f60d1f8ca8eb22.1571288375.git.sbobroff@linux.ibm.com
2019-11-13 16:58:10 +11:00
Andrew Donnellan
1db550f44a powerpc/64s/exception: Fix kaup -> kuap typo
It's KUAP, not KAUP. Fix typo in INT_COMMON macro.

Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191022060603.24101-1-ajd@linux.ibm.com
2019-11-13 16:58:08 +11:00
Thomas Huth
bbbd7f112c powerpc: Replace GPL boilerplate with SPDX identifiers
The FSF does not reside in "675 Mass Ave, Cambridge" anymore...
let's simply use proper SPDX identifiers instead.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190828060737.32531-1-thuth@redhat.com
2019-11-13 16:58:07 +11:00
Aneesh Kumar K.V
d7e02f7b79 powerpc/book3s/mm: Update Oops message to print the correct translation in use
Avoids confusion when printing Oops message like below

 Faulting instruction address: 0xc00000000008bdb4
 Oops: Kernel access of bad area, sig: 11 [#1]
 LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV

This was because we never clear the MMU_FTR_HPTE_TABLE feature flag
even if we run with radix translation. It was discussed that we should
look at this feature flag as an indication of the capability to run
hash translation and we should not clear the flag even if we run in
radix translation. All the code paths check for radix_enabled() check and
if found true consider we are running with radix translation. Follow the
same sequence for finding the MMU translation string to be used in Oops
message.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190711145814.17970-1-aneesh.kumar@linux.ibm.com
2019-11-13 16:58:07 +11:00
YueHaibing
090d5ab93d powerpc/64s: Fix debugfs_simple_attr.cocci warnings
Use DEFINE_DEBUGFS_ATTRIBUTE rather than DEFINE_SIMPLE_ATTRIBUTE
for debugfs files.

Semantic patch information:
Rationale: DEFINE_SIMPLE_ATTRIBUTE + debugfs_create_file()
imposes some significant overhead as compared to
DEFINE_DEBUGFS_ATTRIBUTE + debugfs_create_file_unsafe().

Generated by: scripts/coccinelle/api/debugfs/debugfs_simple_attr.cocci

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1543498518-107601-1-git-send-email-yuehaibing@huawei.com
2019-11-13 16:58:04 +11:00
Ravi Bangoria
27985b2a64 powerpc/watchpoint: Don't ignore extraneous exceptions blindly
On powerpc, watchpoint match range is double-word granular. On a
watchpoint hit, DAR is set to the first byte of overlap between actual
access and watched range. And thus it's quite possible that DAR does
not point inside user specified range. Ex, say user creates a
watchpoint with address range 0x1004 to 0x1007. So hw would be
configured to watch from 0x1000 to 0x1007. If there is a 4 byte access
from 0x1002 to 0x1005, DAR will point to 0x1002 and thus interrupt
handler considers it as extraneous, but it's actually not, because
part of the access belongs to what user has asked.

Instead of blindly ignoring the exception, get actual address range by
analysing an instruction, and ignore only if actual range does not
overlap with user specified range.

Note: The behavior is unchanged for 8xx.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-5-ravi.bangoria@linux.ibm.com
2019-11-13 16:58:03 +11:00
Ravi Bangoria
c3f68b0478 powerpc/watchpoint: Fix ptrace code that muck around with address/len
ptrace_set_debugreg() does not consider new length while overwriting
the watchpoint. Fix that. ppc_set_hwdebug() aligns watchpoint address
to doubleword boundary but does not change the length. If address
range is crossing doubleword boundary and length is less then 8, we
will lose samples from second doubleword. So fix that as well.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-4-ravi.bangoria@linux.ibm.com
2019-11-13 16:58:03 +11:00
Ravi Bangoria
b57aeab811 powerpc/watchpoint: Fix length calculation for unaligned target
Watchpoint match range is always doubleword(8 bytes) aligned on
powerpc. If the given range is crossing doubleword boundary, we need
to increase the length such that next doubleword also get
covered. Ex,

          address   len = 6 bytes
                |=========.
   |------------v--|------v--------|
   | | | | | | | | | | | | | | | | |
   |---------------|---------------|
    <---8 bytes--->

In such case, current code configures hw as:
  start_addr = address & ~HW_BREAKPOINT_ALIGN
  len = 8 bytes

And thus read/write in last 4 bytes of the given range is ignored.
Fix this by including next doubleword in the length.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-3-ravi.bangoria@linux.ibm.com
2019-11-13 16:58:03 +11:00
Ravi Bangoria
b811be615c powerpc/watchpoint: Introduce macros for watchpoint length
We are hadrcoding length everywhere in the watchpoint code. Introduce
macros for the length and use them.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191017093204.7511-2-ravi.bangoria@linux.ibm.com
2019-11-13 16:58:02 +11:00
Gustavo L. F. Walbon
4e706af3cd powerpc/security: Fix wrong message when RFI Flush is disable
The issue was showing "Mitigation" message via sysfs whatever the
state of "RFI Flush", but it should show "Vulnerable" when it is
disabled.

If you have "L1D private" feature enabled and not "RFI Flush" you are
vulnerable to meltdown attacks.

"RFI Flush" is the key feature to mitigate the meltdown whatever the
"L1D private" state.

SEC_FTR_L1D_THREAD_PRIV is a feature for Power9 only.

So the message should be as the truth table shows:

  CPU | L1D private | RFI Flush |                sysfs
  ----|-------------|-----------|-------------------------------------
   P9 |    False    |   False   | Vulnerable
   P9 |    False    |   True    | Mitigation: RFI Flush
   P9 |    True     |   False   | Vulnerable: L1D private per thread
   P9 |    True     |   True    | Mitigation: RFI Flush, L1D private per thread
   P8 |    False    |   False   | Vulnerable
   P8 |    False    |   True    | Mitigation: RFI Flush

Output before this fix:
  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
  Mitigation: RFI Flush, L1D private per thread
  # echo 0 > /sys/kernel/debug/powerpc/rfi_flush
  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
  Mitigation: L1D private per thread

Output after fix:
  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
  Mitigation: RFI Flush, L1D private per thread
  # echo 0 > /sys/kernel/debug/powerpc/rfi_flush
  # cat /sys/devices/system/cpu/vulnerabilities/meltdown
  Vulnerable: L1D private per thread

Signed-off-by: Gustavo L. F. Walbon <gwalbon@linux.ibm.com>
Signed-off-by: Mauro S. M. Rodrigues <maurosr@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20190502210907.42375-1-gwalbon@linux.ibm.com
2019-11-13 16:58:02 +11:00
Tyrel Datwyler
0a87ccd369 powerpc/pseries: Enable support for ibm,drc-info property
Advertise client support for the PAPR architected ibm,drc-info device
tree property during CAS handshake.

Fixes: c7a3275e0f ("powerpc/pseries: Revert support for ibm,drc-info devtree property")
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573449697-5448-11-git-send-email-tyreld@linux.ibm.com
2019-11-13 16:58:00 +11:00
Michael Ellerman
d34a5709be Merge branch 'topic/secureboot' into next
Merge the secureboot support, as well as the IMA changes needed to
support it.

From Nayna's cover letter:
  In order to verify the OS kernel on PowerNV systems, secure boot
  requires X.509 certificates trusted by the platform. These are
  stored in secure variables controlled by OPAL, called OPAL secure
  variables. In order to enable users to manage the keys, the secure
  variables need to be exposed to userspace.

  OPAL provides the runtime services for the kernel to be able to
  access the secure variables. This patchset defines the kernel
  interface for the OPAL APIs. These APIs are used by the hooks, which
  load these variables to the keyring and expose them to the userspace
  for reading/writing.

  Overall, this patchset adds the following support:
    * expose secure variables to the kernel via OPAL Runtime API interface
    * expose secure variables to the userspace via kernel sysfs interface
    * load kernel verification and revocation keys to .platform and
      .blacklist keyring respectively.

  The secure variables can be read/written using simple linux
  utilities cat/hexdump.

  For example:
  Path to the secure variables is: /sys/firmware/secvar/vars

    Each secure variable is listed as directory.
    $ ls -l
    total 0
    drwxr-xr-x. 2 root root 0 Aug 20 21:20 db
    drwxr-xr-x. 2 root root 0 Aug 20 21:20 KEK
    drwxr-xr-x. 2 root root 0 Aug 20 21:20 PK

  The attributes of each of the secure variables are (for example: PK):
    $ ls -l
    total 0
    -r--r--r--. 1 root root  4096 Oct  1 15:10 data
    -r--r--r--. 1 root root 65536 Oct  1 15:10 size
    --w-------. 1 root root  4096 Oct  1 15:12 update

  The "data" is used to read the existing variable value using
  hexdump. The data is stored in ESL format. The "update" is used to
  write a new value using cat. The update is to be submitted as AUTH
  file.
2019-11-13 16:55:50 +11:00
Nayna Jain
bd5d9c743d powerpc: expose secure variables to userspace via sysfs
PowerNV secure variables, which store the keys used for OS kernel
verification, are managed by the firmware. These secure variables need to
be accessed by the userspace for addition/deletion of the certificates.

This patch adds the sysfs interface to expose secure variables for PowerNV
secureboot. The users shall use this interface for manipulating
the keys stored in the secure variables.

Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-3-git-send-email-nayna@linux.ibm.com
2019-11-13 00:33:22 +11:00
Nayna Jain
9155e2341a powerpc/powernv: Add OPAL API interface to access secure variable
The X.509 certificates trusted by the platform and required to secure
boot the OS kernel are wrapped in secure variables, which are
controlled by OPAL.

This patch adds firmware/kernel interface to read and write OPAL
secure variables based on the unique key.

This support can be enabled using CONFIG_OPAL_SECVAR.

Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Nayna Jain <nayna@linux.ibm.com>
Signed-off-by: Eric Richter <erichte@linux.ibm.com>
[mpe: Make secvar_ops __ro_after_init, only build opal-secvar.c if PPC_SECURE_BOOT=y]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1573441836-3632-2-git-send-email-nayna@linux.ibm.com
2019-11-13 00:33:22 +11:00