linux

Author	SHA1	Message	Date
Christophe Leroy	e28d0b6750	powerpc/lib/sstep: Don't use __{get/put}_user() on kernel addresses In the old days, when we didn't have kernel userspace access protection and had set_fs(), it was wise to use __get_user() and friends to read kernel memory. Nowadays, get_user() and put_user() are granting userspace access and are exclusively for userspace access. Convert single step emulation functions to user_access_begin() and friends and use unsafe_get_user() and unsafe_put_user(). When addressing kernel addresses, there is no need to open userspace access. And for book3s/32 it is particularly important to no try and open userspace access on kernel address, because that would break the content of kernel space segment registers. No guard has been put against that risk in order to avoid degrading performance. copy_from_kernel_nofault() and copy_to_kernel_nofault() should be used but they are out-of-line functions which would degrade performance. Those two functions are making use of __get_kernel_nofault() and __put_kernel_nofault() macros. Those two macros are just wrappers behind __get_user_size_goto() and __put_user_size_goto(). unsafe_get_user() and unsafe_put_user() are also wrappers of __get_user_size_goto() and __put_user_size_goto(). Use them to access kernel space. That allows refactoring userspace and kernelspace access. Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Depends-on: `4fe5cda9f8` ("powerpc/uaccess: Implement user_read_access_begin and user_write_access_begin") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/22831c9d17f948680a12c5292e7627288b15f713.1631817805.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:05 +11:00
Christophe Leroy	cbe654c779	powerpc: warn on emulation of dcbz instruction in kernel mode dcbz instruction shouldn't be used on non-cached memory. Using it on non-cached memory can result in alignment exception and implies a heavy handling. Instead of silentely emulating the instruction and resulting in high performance degradation, warn whenever an alignment exception is taken in kernel mode due to dcbz, so that the user is made aware that dcbz instruction has been used unexpectedly by the kernel. Reported-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2e3acfe63d289c6fba366e16973c9ab8369e8b75.1631803922.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:05 +11:00
Christophe Leroy	5c810ced36	powerpc/32: Add support for out-of-line static calls Add support for out-of-line static calls on PPC32. This change improve performance of calls to global function pointers by using direct calls instead of indirect calls. The trampoline is initialy populated with a 'blr' or branch to target, followed by an unreachable long jump sequence. In order to cater with parallele execution, the trampoline needs to be updated in a way that ensures it remains consistent at all time. This means we can't use the traditional lis/addi to load r12 with the target address, otherwise there would be a window during which the first instruction contains the upper part of the new target address while the second instruction still contains the lower part of the old target address. To avoid that the target address is stored just after the 'bctr' and loaded from there with a single instruction. Then, depending on the target distance, arch_static_call_transform() will either replace the first instruction by a direct 'bl <target>' or 'nop' in order to have the trampoline fall through the long jump sequence. For the special case of __static_call_return0(), to avoid the risk of a far branch, a version of it is inlined at the end of the trampoline. Performancewise the long jump sequence is probably not better than the indirect calls set by GCC when we don't use static calls, but such calls are unlikely to be required on powerpc32: With most configurations the kernel size is far below 32 Mbytes so only modules may happen to be too far. And even modules are likely to be close enough as they are allocated below the kernel core and as close as possible of the kernel text. static_call selftest is running successfully with this change. With this patch, __do_irq() has the following sequence to trace irq entries: c0004a00 <__SCT__tp_func_irq_entry>: c0004a00: 48 00 00 e0 b c0004ae0 <__traceiter_irq_entry> c0004a04: 3d 80 c0 00 lis r12,-16384 c0004a08: 81 8c 4a 1c lwz r12,18972(r12) c0004a0c: 7d 89 03 a6 mtctr r12 c0004a10: 4e 80 04 20 bctr c0004a14: 38 60 00 00 li r3,0 c0004a18: 4e 80 00 20 blr c0004a1c: 00 00 00 00 .long 0x0 ... c0005654 <__do_irq>: ... c0005664: 7c 7f 1b 78 mr r31,r3 ... c00056a0: 81 22 00 00 lwz r9,0(r2) c00056a4: 39 29 00 01 addi r9,r9,1 c00056a8: 91 22 00 00 stw r9,0(r2) c00056ac: 3d 20 c0 af lis r9,-16209 c00056b0: 81 29 74 cc lwz r9,29900(r9) c00056b4: 2c 09 00 00 cmpwi r9,0 c00056b8: 41 82 00 10 beq c00056c8 <__do_irq+0x74> c00056bc: 80 69 00 04 lwz r3,4(r9) c00056c0: 7f e4 fb 78 mr r4,r31 c00056c4: 4b ff f3 3d bl c0004a00 <__SCT__tp_func_irq_entry> Before this patch, __do_irq() was doing the following to trace irq entries: c0005700 <__do_irq>: ... c0005710: 7c 7e 1b 78 mr r30,r3 ... c000574c: 93 e1 00 0c stw r31,12(r1) c0005750: 81 22 00 00 lwz r9,0(r2) c0005754: 39 29 00 01 addi r9,r9,1 c0005758: 91 22 00 00 stw r9,0(r2) c000575c: 3d 20 c0 af lis r9,-16209 c0005760: 83 e9 f4 cc lwz r31,-2868(r9) c0005764: 2c 1f 00 00 cmpwi r31,0 c0005768: 41 82 00 24 beq c000578c <__do_irq+0x8c> c000576c: 81 3f 00 00 lwz r9,0(r31) c0005770: 80 7f 00 04 lwz r3,4(r31) c0005774: 7d 29 03 a6 mtctr r9 c0005778: 7f c4 f3 78 mr r4,r30 c000577c: 4e 80 04 21 bctrl c0005780: 85 3f 00 0c lwzu r9,12(r31) c0005784: 2c 09 00 00 cmpwi r9,0 c0005788: 40 82 ff e4 bne c000576c <__do_irq+0x6c> Behind the fact of now using a direct 'bl' instead of a 'load/mtctr/bctr' sequence, we can also see that we get one less register on the stack. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:05 +11:00
Christophe Leroy	8f7fadb4ba	powerpc/machdep: Remove stale functions from ppc_md structure ppc_md.iommu_save() is not set anymore by any platform after commit `c40785ad30` ("powerpc/dart: Use a cachable DART"). So iommu_save() has become a nop and can be removed. ppc_md.show_percpuinfo() is not set anymore by any platform after commit `4350147a81` ("[PATCH] ppc64: SMU based macs cpufreq support"). Last users of ppc_md.rtc_read_val() and ppc_md.rtc_write_val() were removed by commit `0f03a43b8f` ("[POWERPC] Remove todc code from ARCH=powerpc") Last user of kgdb_map_scc() was removed by commit `17ce452f7e` ("kgdb, powerpc: arch specific powerpc kgdb support"). ppc.machine_kexec_prepare() has not been used since commit `8ee3e0d696` ("powerpc: Remove the main legacy iSerie platform code"). This allows the removal of machine_kexec_prepare() and the rename of default_machine_kexec_prepare() into machine_kexec_prepare() Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Daniel Axtens <dja@axtens.net> [mpe: Drop prototype for default_machine_kexec_prepare() as noted by dja] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/24d4ca0ada683c9436a5f812a7aeb0a1362afa2b.1630398606.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:05 +11:00
Christophe Leroy	e606a2f46c	powerpc/time: Remove generic_suspend_{dis/en}able_irqs() Commit `d75d68cfef` ("powerpc: Clean up obsolete code relating to decrementer and timebase") made generic_suspend_enable_irqs() and generic_suspend_disable_irqs() static. Fold them into their only caller. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c3f9ec9950394ef939014f7934268e6ee30ca04f.1630398566.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:05 +11:00
Christophe Leroy	566af8cda3	powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC Commit `e65e1fc2d2` ("[PATCH] syscall class hookup for all normal targets") added generic support for AUDIT but that didn't include support for bi-arch like powerpc. Commit `4b58841149` ("audit: Add generic compat syscall support") added generic support for bi-arch. Convert powerpc to that bi-arch generic audit support. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a4b3951d1191d4183d92a07a6097566bde60d00a.1629812058.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:04 +11:00
Christophe Leroy	a85c728cb5	powerpc/32: Don't use lmw/stmw for saving/restoring non volatile regs Instructions lmw/stmw are interesting for functions that are rarely used and not in the cache, because only one instruction is to be copied into the instruction cache instead of 19. However those instruction are less performant than 19x raw lwz/stw as they require synchronisation plus one additional cycle. SAVE_NVGPRS / REST_NVGPRS are used in only a few places which are mostly in interrupts entries/exits and in task switch so they are likely already in the cache. Using standard lwz improves null_syscall selftest by: - 10 cycles on mpc832x. - 2 cycles on mpc8xx. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/316c543b8906712c108985c8463eec09c8db577b.1629732542.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:04 +11:00
Anatolij Gustschin	aed2886a5e	powerpc/5200: dts: fix memory node unit name Fixes build warnings: Warning (unit_address_vs_reg): /memory: node has a reg or ranges property, but no unit name Signed-off-by: Anatolij Gustschin <agust@denx.de> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211013220532.24759-4-agust@denx.de	2021-10-22 15:22:04 +11:00
Anatolij Gustschin	7855b6c66d	powerpc/5200: dts: fix pci ranges warnings Fix ranges property warnings: pci@f0000d00:ranges: 'oneOf' conditional failed, one must be fixed: Signed-off-by: Anatolij Gustschin <agust@denx.de> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211013220532.24759-3-agust@denx.de	2021-10-22 15:22:04 +11:00
Anatolij Gustschin	e9efabc6e4	powerpc/5200: dts: add missing pci ranges Add ranges property to fix build warnings: Warning (pci_bridge): /pci@f0000d00: missing ranges for PCI bridge (or not a bridge) Signed-off-by: Anatolij Gustschin <agust@denx.de> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211013220532.24759-2-agust@denx.de	2021-10-22 15:22:04 +11:00
Gustavo A. R. Silva	61cb9ac66b	powerpc/vas: Fix potential NULL pointer dereference (!ptr && !ptr->foo) strikes again. :) The expression (!ptr && !ptr->foo) is bogus and in case ptr is NULL, it leads to a NULL pointer dereference: ptr->foo. Fix this by converting && to \|\| This issue was detected with the help of Coccinelle, and audited and fixed manually. Fixes: `1a0d0d5ed5` ("powerpc/vas: Add platform specific user window operations") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015050345.GA1161918@embeddedor	2021-10-22 15:22:04 +11:00
Christophe Leroy	49e3d8ea62	powerpc/fsl_booke: Enable STRICT_KERNEL_RWX Enable STRICT_KERNEL_RWX on fsl_booke. For that, we need additional TLBCAMs dedicated to linear mapping, based on the alignment of _sinittext. By default, up to 768 Mbytes of memory are mapped. It uses 3 TLBCAMs of size 256 Mbytes. With a data alignment of 16, we need up to 9 TLBCAMs: 16/16/16/16/64/64/64/256/256 With a data alignment of 4, we need up to 12 TLBCAMs: 4/4/4/4/16/16/16/64/64/64/256/256 With a data alignment of 1, we need up to 15 TLBCAMs: 1/1/1/1/4/4/4/16/16/16/64/64/64/256/256 By default, set a 16 Mbytes alignment as a compromise between memory usage and number of TLBCAMs. This can be adjusted manually when needed. For the time being, it doens't work when the base is randomised. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/29f9e5d2bbbc83ae9ca879265426a6278bf4d5bb.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	d5970045cf	powerpc/fsl_booke: Update of TLBCAMs after init After init, set readonly memory as ROX and set readwrite memory as RWX, if STRICT_KERNEL_RWX is enabled. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/66bef0b9c273e1121706883f3cf5ad0a053d863f.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	0b2859a743	powerpc/fsl_booke: Allocate separate TLBCAMs for readonly memory Reorganise TLBCAM allocation so that when STRICT_KERNEL_RWX is enabled, TLBCAMs are allocated such that readonly memory uses different TLBCAMs. This results in an allocation looking like: Memory CAM mapping: 4/4/4/1/1/1/1/16/16/16/64/64/64/256/256 Mb, residual: 256Mb Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/8ca169bc288261a0e0558712f979023c3a960ebb.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	52bda69ae8	powerpc/fsl_booke: Tell map_mem_in_cams() if init is done In order to be able to call map_mem_in_cams() once more after init for STRICT_KERNEL_RWX, add an argument. For now, map_mem_in_cams() is always called only during init. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3b69a7e0b393b16984ade882a5eae5d727717459.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	a97dd9e2f7	powerpc/fsl_booke: Enable reloading of TLBCAM without switching to AS1 Avoid switching to AS1 when reloading TLBCAM after init for STRICT_KERNEL_RWX. When we setup AS1 we expect the entire accessible memory to be mapped through one entry, this is not the case anymore at the end of init. We are not changing the size of TLBCAMs, only flags, so no need to switch to AS1. So change loadcam_multi() to not switch to AS1 when the given temporary tlb entry in 0. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/a9d517fbfbc940f56103c46b323f6eb8f4485571.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	01116e6e98	powerpc/fsl_booke: Take exec flag into account when setting TLBCAMs Don't force MAS3_SX and MAS3_UX at all time. Take into account the exec flag. While at it, fix a couple of closeby style problems (indent with space and unnecessary parenthesis), it keeps more readability. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5467044e59f27f9fcf709b9661779e3ce5f784f6.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:03 +11:00
Christophe Leroy	3a75fd709c	powerpc/fsl_booke: Rename fsl_booke.c to fsl_book3e.c We have a myriad of CONFIG symbols around different variants of BOOKEs, which would be worth tidying up one day. But at least, make file names and CONFIG option match: We have CONFIG_FSL_BOOKE and CONFIG_PPC_FSL_BOOK3E. fsl_booke.c is selected by and only by CONFIG_PPC_FSL_BOOK3E. So rename it fsl_book3e to reduce confusion. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/5dc871db1f67739319bec11f049ca450da1c13a2.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:02 +11:00
Christophe Leroy	68b44f94d6	powerpc/booke: Disable STRICT_KERNEL_RWX, DEBUG_PAGEALLOC and KFENCE fsl_booke and 44x are not able to map kernel linear memory with pages, so they can't support DEBUG_PAGEALLOC and KFENCE, and STRICT_KERNEL_RWX is also a problem for now. Enable those only on book3s (both 32 and 64 except KFENCE), 8xx and 40x. Fixes: `88df6e90fa` ("[POWERPC] DEBUG_PAGEALLOC for 32-bit") Fixes: `95902e6c88` ("powerpc/mm: Implement STRICT_KERNEL_RWX on PPC32") Fixes: `90cbac0e99` ("powerpc: Enable KFENCE for PPC32") Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d1ad9fdd9b27da3fdfa16510bb542ed51fa6e134.1634292136.git.christophe.leroy@csgroup.eu	2021-10-22 15:22:02 +11:00
Wan Jiabing	7453f501d4	powerpc/kexec_file: Add of_node_put() before goto Fix following coccicheck warning: ./arch/powerpc/kexec/file_load_64.c:698:1-22: WARNING: Function for_each_node_by_type should have of_node_put() before goto Early exits from for_each_node_by_type should decrement the node reference counter. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211018015418.10182-1-wanjiabing@vivo.com	2021-10-22 15:22:02 +11:00
Wan Jiabing	915b368f69	powerpc/pseries/iommu: Add of_node_put() before break Fix following coccicheck warning: ./arch/powerpc/platforms/pseries/iommu.c:924:1-28: WARNING: Function for_each_node_with_property should have of_node_put() before break Early exits from for_each_node_with_property should decrement the node reference counter. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Leonardo Bras <leobras.c@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211014075624.16344-1-wanjiabing@vivo.com	2021-10-22 15:22:02 +11:00
Joel Stanley	4f703e7faa	powerpc/s64: Clarify that radix lacks DEBUG_PAGEALLOC The page_alloc.c code will call into __kernel_map_pages() when DEBUG_PAGEALLOC is configured and enabled. As the implementation assumes hash, this should crash spectacularly if not for a bit of luck in __kernel_map_pages(). In this function linear_map_hash_count is always zero, the for loop exits without doing any damage. There are no other platforms that determine if they support debug_pagealloc at runtime. Instead of adding code to mm/page_alloc.c to do that, this change turns the map/unmap into a noop when in radix mode and prints a warning once. Signed-off-by: Joel Stanley <joel@jms.id.au> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> [mpe: Reformat if per Christophe's suggestion] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211013213438.675095-1-joel@jms.id.au	2021-10-22 15:22:02 +11:00
Linus Torvalds	0a3221b658	Merge tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a bug exposed by a previous fix, where running guests with certain SMT topologies could crash the host on Power8. - Fix atomic sleep warnings when re-onlining CPUs, when PREEMPT is enabled. Thanks to Nathan Lynch, Srikar Dronamraju, and Valentin Schneider. * tag 'powerpc-5.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/smp: do not decrement idle task preempt count in CPU offline powerpc/idle: Don't corrupt back chain when going idle	2021-10-21 15:30:09 -10:00
Len Baker	5dfbbb668a	KVM: PPC: Replace zero-length array with flexible array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use "flexible array members" [1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Also, make use of the struct_size() helper in kzalloc(). [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/latest/process/deprecated.html#zero-length-and-one-element-arrays Signed-off-by: Len Baker <len.baker@gmx.com> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2021-10-20 18:30:42 -05:00
Rob Herring	41408b22ec	powerpc: Use of_get_cpu_hwid() Replace open coded parsing of CPU nodes' 'reg' property with of_get_cpu_hwid(). Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211006164332.1981454-8-robh@kernel.org	2021-10-20 13:37:24 -05:00
Nathan Lynch	787252a10d	powerpc/smp: do not decrement idle task preempt count in CPU offline With PREEMPT_COUNT=y, when a CPU is offlined and then onlined again, we get: BUG: scheduling while atomic: swapper/1/0/0x00000000 no locks held by swapper/1/0. CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.15.0-rc2+ #100 Call Trace: dump_stack_lvl+0xac/0x108 __schedule_bug+0xac/0xe0 __schedule+0xcf8/0x10d0 schedule_idle+0x3c/0x70 do_idle+0x2d8/0x4a0 cpu_startup_entry+0x38/0x40 start_secondary+0x2ec/0x3a0 start_secondary_prolog+0x10/0x14 This is because powerpc's arch_cpu_idle_dead() decrements the idle task's preempt count, for reasons explained in commit `a7c2bb8279` ("powerpc: Re-enable preemption before cpu_die()"), specifically "start_secondary() expects a preempt_count() of 0." However, since commit `2c669ef697` ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug") and commit `f1a0a376ca` ("sched/core: Initialize the idle task with preemption disabled"), that justification no longer holds. The idle task isn't supposed to re-enable preemption, so remove the vestigial preempt_enable() from the CPU offline path. Tested with pseries and powernv in qemu, and pseries on PowerVM. Fixes: `2c669ef697` ("powerpc/preempt: Don't touch the idle task's preempt_count during hotplug") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015173902.2278118-1-nathanl@linux.ibm.com	2021-10-20 21:38:01 +11:00
Michael Ellerman	496c5fe25c	powerpc/idle: Don't corrupt back chain when going idle In isa206_idle_insn_mayloss() we store various registers into the stack red zone, which is allowed. However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again, to 0(r1), which corrupts the stack back chain. We used to do the same in isa206_idle_insn_mayloss() itself, but we fixed that in `73287caa92` ("powerpc64/idle: Fix SP offsets when saving GPRs"), however we missed that the macro also corrupts the back chain. Corrupting the back chain is bad for debuggability but doesn't necessarily cause a bug. However we recently changed the stack handling in some KVM code, and it now relies on the stack back chain being valid when it returns. The corruption causes that code to return with r1 pointing somewhere in kernel data, at some point LR is restored from the stack and we branch to NULL or somewhere else invalid. Only affects Power8 hosts running KVM guests, with dynamic_mt_modes enabled (which it is by default). The fixes tag below points to the commit that changed the KVM stack handling, exposing this bug. The actual corruption of the back chain has always existed since `948cf67c47` ("powerpc: Add NAP mode support on Power7 in HV mode"). Fixes: `9b4416c509` ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211020094826.3222052-1-mpe@ellerman.id.au	2021-10-20 21:37:58 +11:00
Kajol Jain	26da4abfb3	powerpc/perf: Fix data source encodings for L2.1 and L3.1 accesses Fix the data source encodings to represent L2.1/L3.1(another core's L2/L3 on the same node) accesses properly for power10 and older plaforms. Add new macros(LEVEL/REM) which can be used to add mem_lvl_num and remote field data inside perf_mem_data_src structure. Result in power9 system with patch changes: localhost:~/linux/tools/perf # ./perf mem report \| grep Remote 0.01% 1 252 Remote core, same node L3 or L3 hit [.] 0x0000000000002dd0 producer_consumer [.] 0x00007fff7f25eb90 anon HitM N/A No N/A 0 0 0.01% 1 220 Remote core, same node L3 or L3 hit [.] 0x0000000000002dd0 producer_consumer [.] 0x00007fff77776d90 anon HitM N/A No N/A 0 0 0.01% 1 220 Remote core, same node L3 or L3 hit [.] 0x0000000000002dd0 producer_consumer [.] 0x00007fff817d9410 anon HitM N/A No N/A 0 0 Fixes: `79e96f8f93` ("powerpc/perf: Export memory hierarchy info to user space") Signed-off-by: Kajol Jain <kjain@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211006140654.298352-5-kjain@linux.ibm.com	2021-10-19 17:27:01 +02:00
Andreas Gruenbacher	bb523b406c	gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} Turn fault_in_pages_{readable,writeable} into versions that return the number of bytes not faulted in, similar to copy_to_user, instead of returning a non-zero value when any of the requested pages couldn't be faulted in. This supports the existing users that require all pages to be faulted in as well as new users that are happy if any pages can be faulted in. Rename the functions to fault_in_{readable,writeable} to make sure this change doesn't silently break things. Neither of these functions is entirely trivial and it doesn't seem useful to inline them, so move them to mm/gup.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-10-18 16:33:03 +02:00
Andreas Gruenbacher	0c8eb2884a	powerpc/kvm: Fix kvm_use_magic_page When switching from __get_user to fault_in_pages_readable, commit `9f9eae5ce7` broke kvm_use_magic_page: like __get_user, fault_in_pages_readable returns 0 on success. Fixes: `9f9eae5ce7` ("powerpc/kvm: Prefer fault_in_pages_readable function") Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-10-18 16:31:45 +02:00
Uwe Kleine-König	4141127c44	powerpc/eeh: Use to_pci_driver() instead of pci_dev->driver Struct pci_driver contains a struct device_driver, so for PCI devices, it's easy to convert a device_driver * to a pci_driver * with to_pci_driver(). The device_driver * is in struct device, so we don't need to also keep track of the pci_driver * in struct pci_dev. Replace pdev->driver with to_pci_driver(). This is a step toward removing pci_dev->driver. [bhelgaas: split to separate patch] Link: https://lore.kernel.org/r/20211004125935.2300113-11-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2021-10-18 09:20:15 -05:00
Christoph Hellwig	e41d12f539	mm: don't include <linux/blk-cgroup.h> in <linux/backing-dev.h> There is no need to pull blk-cgroup.h and thus blkdev.h in here, so break the include chain. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-18 06:17:01 -06:00
Linus Torvalds	be9eb2f00f	Merge tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix a bug where guests on P9 with interrupts passed through could get stuck in synchronize_irq(). - Fix a bug in KVM on P8 where secondary threads entering a guest would write outside their allocated stack. - Fix a bug in KVM on P8 where secondary threads could confuse the host offline code and cause the guest or host to crash. Thanks to Cédric Le Goater. * tag 'powerpc-5.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest() powerpc/xive: Discard disabled interrupts in get_irqchip_state()	2021-10-17 18:01:32 -10:00
Michael Ellerman	cdeb5d7d89	KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest We call idle_kvm_start_guest() from power7_offline() if the thread has been requested to enter KVM. We pass it the SRR1 value that was returned from power7_idle_insn() which tells us what sort of wakeup we're processing. Depending on the SRR1 value we pass in, the KVM code might enter the guest, or it might return to us to do some host action if the wakeup requires it. If idle_kvm_start_guest() is able to handle the wakeup, and enter the guest it is supposed to indicate that by returning a zero SRR1 value to us. That was the behaviour prior to commit `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C"), however in that commit the handling of SRR1 was reworked, and the zeroing behaviour was lost. Returning from idle_kvm_start_guest() without zeroing the SRR1 value can confuse the host offline code, causing the guest to crash and other weirdness. Fixes: `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-2-mpe@ellerman.id.au	2021-10-16 00:40:03 +11:00
Michael Ellerman	9b4416c509	KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest() In commit `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C") kvm_start_guest() became idle_kvm_start_guest(). The old code allocated a stack frame on the emergency stack, but didn't use the frame to store anything, and also didn't store anything in its caller's frame. idle_kvm_start_guest() on the other hand is written more like a normal C function, it creates a frame on entry, and also stores CR/LR into its callers frame (per the ABI). The problem is that there is no caller frame on the emergency stack. The emergency stack for a given CPU is allocated with: paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE; So emergency_sp actually points to the first address above the emergency stack allocation for a given CPU, we must not store above it without first decrementing it to create a frame. This is different to the regular kernel stack, paca->kstack, which is initialised to point at an initial frame that is ready to use. idle_kvm_start_guest() stores the backchain, CR and LR all of which write outside the allocation for the emergency stack. It then creates a stack frame and saves the non-volatile registers. Unfortunately the frame it creates is not large enough to fit the non-volatiles, and so the saving of the non-volatile registers also writes outside the emergency stack allocation. The end result is that we corrupt whatever is at 0-24 bytes, and 112-248 bytes above the emergency stack allocation. In practice this has gone unnoticed because the memory immediately above the emergency stack happens to be used for other stack allocations, either another CPUs mc_emergency_sp or an IRQ stack. See the order of calls to irqstack_early_init() and emergency_stack_init(). The low addresses of another stack are the top of that stack, and so are only used if that stack is under extreme pressue, which essentially never happens in practice - and if it did there's a high likelyhood we'd crash due to that stack overflowing. Still, we shouldn't be corrupting someone else's stack, and it is purely luck that we aren't corrupting something else. To fix it we save CR/LR into the caller's frame using the existing r1 on entry, we then create a SWITCH_FRAME_SIZE frame (which has space for pt_regs) on the emergency stack with the backchain pointing to the existing stack, and then finally we switch to the new frame on the emergency stack. Fixes: `10d91611f4` ("powerpc/64s: Reimplement book3s idle code in C") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au	2021-10-16 00:39:54 +11:00
Kees Cook	42a20f86dc	sched: Add wrapper for get_wchan() to keep task blocked Having a stable wchan means the process must be blocked and for it to stay that way while performing stack unwinding. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm] Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64] Link: https://lkml.kernel.org/r/20211008111626.332092234@infradead.org	2021-10-15 11:25:14 +02:00
Christophe Leroy	3091f5fc5f	powerpc: Mark .opd section read-only .opd section contains function descriptors used to locate functions in the kernel. If someone is able to modify a function descriptor he will be able to run arbitrary kernel function instead of another. To avoid that, move .opd section inside read-only memory. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/3cd40b682fb6f75bb40947b55ca0bac20cb3f995.1634136222.git.christophe.leroy@csgroup.eu	2021-10-14 21:47:34 +11:00
Athira Rajeev	8f6aca0e0f	powerpc/perf: Fix cycles/instructions as PM_CYC/PM_INST_CMPL in power10 On power9 and earlier platforms, the default event used for cyles and instructions is PM_CYC (0x0001e) and PM_INST_CMPL (0x00002) respectively. These events use two programmable PMCs and by default will count irrespective of the run latch state (idle state). But since they use programmable PMCs, these events can lead to multiplexing with other events, because there are only 4 programmable PMCs. Hence in power10, performance monitoring unit (PMU) driver uses performance monitor counter 5 (PMC5) and performance monitor counter6 (PMC6) for counting instructions and cycles. Currently on power10, the event used for cycles is PM_RUN_CYC (0x600F4) and instructions uses PM_RUN_INST_CMPL (0x500fa). But counting of these events in idle state is controlled by the CC56RUN bit setting in Monitor Mode Control Register0 (MMCR0). If the CC56RUN bit is zero, PMC5/6 will not count when CTRL[RUN] (run latch) is zero. This could lead to missing some counts if a thread is in idle state during system wide profiling. To fix it, set the CC56RUN bit in MMCR0 for power10, which makes PMC5 and PMC6 count instructions and cycles regardless of the run latch state. Since this change make PMC5/6 count as PM_INST_CMPL/PM_CYC, rename the event code 0x600f4 as PM_CYC instead of PM_RUN_CYC and event code 0x500fa as PM_INST_CMPL instead of PM_RUN_INST_CMPL. The changes are only for PMC5/6 event codes and will not affect the behaviour of PM_RUN_CYC/PM_RUN_INST_CMPL if progammed in other PMC's. Fixes: `a64e697cef` ("powerpc/perf: power10 Performance Monitoring support") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.cm> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com> [mpe: Tweak change log wording for style and consistency] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211007075121.28497-1-atrajeev@linux.vnet.ibm.com	2021-10-14 21:46:45 +11:00
Kai Song	b616230e23	powerpc/eeh: Fix docstrings in eeh.c We fix the following warnings when building kernel with W=1: arch/powerpc/kernel/eeh.c:598: warning: Function parameter or member 'function' not described in 'eeh_pci_enable' arch/powerpc/kernel/eeh.c:774: warning: Function parameter or member 'edev' not described in 'eeh_set_dev_freset' arch/powerpc/kernel/eeh.c:774: warning: expecting prototype for eeh_set_pe_freset(). Prototype was for eeh_set_dev_freset() instead arch/powerpc/kernel/eeh.c:814: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset_full' arch/powerpc/kernel/eeh.c:944: warning: Function parameter or member 'ops' not described in 'eeh_init' arch/powerpc/kernel/eeh.c:1451: warning: Function parameter or member 'include_passed' not described in 'eeh_pe_reset' arch/powerpc/kernel/eeh.c:1526: warning: Function parameter or member 'func' not described in 'eeh_pe_inject_err' arch/powerpc/kernel/eeh.c:1526: warning: Excess function parameter 'function' described in 'eeh_pe_inject_err' Signed-off-by: Kai Song <songkai01@inspur.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211009041630.4135-1-songkai01@inspur.com	2021-10-13 16:45:33 +11:00
Cédric Le Goater	6ffeb56ee2	powerpc/boot: Use CONFIG_PPC_POWERNV to compile OPAL support CONFIG_PPC64_BOOT_WRAPPER is selected by CPU_LITTLE_ENDIAN which is used to compile support for other platforms such as Microwatt. There is no need for OPAL calls on these. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211011070356.99952-1-clg@kaod.org	2021-10-13 16:45:12 +11:00
Cédric Le Goater	6f779e1d35	powerpc/xive: Discard disabled interrupts in get_irqchip_state() When an interrupt is passed through, the KVM XIVE device calls the set_vcpu_affinity() handler which raises the P bit to mask the interrupt and to catch any in-flight interrupts while routing the interrupt to the guest. On the guest side, drivers (like some Intels) can request at probe time some MSIs and call synchronize_irq() to check that there are no in flight interrupts. This will call the XIVE get_irqchip_state() handler which will always return true as the interrupt P bit has been set on the host side and lock the CPU in an infinite loop. Fix that by discarding disabled interrupts in get_irqchip_state(). Fixes: `da15c03b04` ("powerpc/xive: Implement get_irqchip_state method for XIVE to fix shutdown race") Cc: stable@vger.kernel.org #v5.4+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Tested-by: seeteena <s1seetee@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211011070203.99726-1-clg@kaod.org	2021-10-13 16:38:55 +11:00
Christophe Leroy	602946ec2f	powerpc: Set max_mapnr correctly max_mapnr is used by virt_addr_valid() to check if a linear address is valid. It must only include lowmem PFNs, like other architectures. Problem detected on a system with 1G mem (Only 768M are mapped), with CONFIG_DEBUG_VIRTUAL and CONFIG_TEST_DEBUG_VIRTUAL, it didn't report virt_to_phys(VMALLOC_START), VMALLOC_START being 0xf1000000. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/77d99037782ac4b3c3b0124fc4ae80ce7b760b05.1634035228.git.christophe.leroy@csgroup.eu	2021-10-13 13:28:22 +11:00
Nicholas Piggin	322fda0405	KVM: PPC: Book3S HV: H_ENTER filter out reserved HPTE[B] value The HPTE B field is a 2-bit field with values 0b10 and 0b11 reserved. This field is also taken from the HPTE and used when KVM executes TLBIEs to set the B field of those instructions. Disallow the guest setting B to a reserved value with H_ENTER by rejecting it. This is the same approach already taken for rejecting reserved (unsupported) LLP values. This prevents the guest from being able to induce the host to execute TLBIE with reserved values, which is not known to be a problem with current processors but in theory it could prevent the TLBIE from working correctly in a future processor. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211004145749.1331331-1-npiggin@gmail.com	2021-10-13 13:08:16 +11:00
Uwe Kleine-König	5a72431ec3	powerpc/eeh: Use dev_driver_string() instead of struct pci_dev->driver->name Replace pdev->driver->name by dev_driver_string() for the corresponding struct device. This is a step toward removing pci_dev->driver. Move the function nearer its only user and instead of the ?: operator use a normal "if" which is more readable. Link: https://lore.kernel.org/r/20211004125935.2300113-6-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2021-10-12 17:50:12 -05:00
Athira Rajeev	29908bbf7b	powerpc/perf: Expose instruction and data address registers as part of extended regs Patch adds support to include Sampled Instruction Address Register (SIAR) and Sampled Data Address Register (SDAR) SPRs as part of extended registers. Update the definition of PERF_REG_PMU_MASK_300/31 and PERF_REG_EXTENDED_MAX to include these SPR's. Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Kajol Jain<kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211007065505.27809-4-atrajeev@linux.vnet.ibm.com	2021-10-12 18:39:02 +11:00
Athira Rajeev	02b182e674	powerpc/perf: Refactor the code definition of perf reg extended mask PERF_REG_PMU_MASK_300 and PERF_REG_PMU_MASK_31 defines the mask value for extended registers. Current definition of these mask values uses hex constant and does not use registers by name, making it less readable. Patch refactor the macro values by or'ing together the actual register value constants. Also include PERF_REG_EXTENDED_MAX as part of enum definition. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Kajol Jain<kjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211007065505.27809-2-atrajeev@linux.vnet.ibm.com	2021-10-12 18:38:50 +11:00
Linus Torvalds	efb52a7d95	Merge tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "A bit of a big batch, partly because I didn't send any last week, and also just because the BPF fixes happened to land this week. Summary: - Fix a regression hit by the IPR SCSI driver, introduced by the recent addition of MSI domains on pseries. - A big series including 8 BPF fixes, some with potential security impact and the rest various code generation issues. - Fix our program check assembler entry path, which was accidentally jumping into a gas macro and generating strange stack frames, which could confuse find_bug(). - A couple of fixes, and related changes, to fix corner cases in our machine check handling. - Fix our DMA IOMMU ops, which were not always returning the optimal DMA mask, leading to at least one device falling back to 32-bit DMA when it shouldn't. - A fix for KUAP handling on 32-bit Book3S. - Fix crashes seen when kdumping on some pseries systems. Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem, Christoph Hellwig, Johan Almbladh, Stan Johnson" * tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init powerpc/32s: Fix kuap_kernel_restore() powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler powerpc/64s: Fix unrecoverable MCE calling async handler from NMI powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG powerpc/64: warn if local irqs are enabled in NMI or hardirq context powerpc/traps: do not enable irqs in _exception powerpc/64s: fix program check interrupt emergency stack path powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END powerpc/bpf ppc32: Fix JMP32_JSET_K powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC powerpc/security: Add a helper to query stf_barrier type powerpc/bpf: Fix BPF_SUB when imm == 0x80000000 powerpc/bpf: Fix BPF_MOD when imm == 1 powerpc/bpf: Validate branch ranges powerpc/lib: Add helper to check if offset is within conditional branch range powerpc/iommu: Report the correct most efficient DMA mask for PCI devices	2021-10-10 10:12:42 -07:00
Nathan Lynch	f9473a6571	powerpc/pseries/cpuhp: remove obsolete comment from pseries_cpu_die This comment likely refers to the obsolete DLPAR workflow where some resource state transitions were driven more directly from user space utilities, but it also seems to contradict itself: "Change isolate state to Isolate [...]" is at odds with the preceding sentences, and it does not relate at all to the code that follows. Remove it to prevent confusion. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210927201933.76786-5-nathanl@linux.ibm.com	2021-10-09 00:16:00 +11:00
Nathan Lynch	fa2a5dfe2d	powerpc/pseries/cpuhp: delete add/remove_by_count code The core DLPAR code supports two actions (add and remove) and three subtypes of action: * By DRC index: the action is attempted on a single specified resource. This is the usual case for processors. * By indexed count: the action is attempted on a range of resources beginning at the specified index. This is implemented only by the memory DLPAR code. * By count: the lower layer (CPU or memory) is responsible for locating the specified number of resources to which the action can be applied. I cannot find any evidence of the "by count" subtype being used by drmgr or qemu for processors. And when I try to exercise this code, the add case does not work: $ ppc64_cpu --smt ; nproc SMT=8 24 $ printf "cpu remove count 2" > /sys/kernel/dlpar $ nproc 8 $ printf "cpu add count 2" > /sys/kernel/dlpar -bash: printf: write error: Invalid argument $ dmesg \| tail -2 pseries-hotplug-cpu: Failed to find enough CPUs (1 of 2) to add dlpar: Could not handle DLPAR request "cpu add count 2" $ nproc 8 $ drmgr -c cpu -a -q 2 # this uses the by-index method Validating CPU DLPAR capability...yes. CPU 1 CPU 17 $ nproc 24 This is because find_drc_info_cpus_to_add() does not increment drc_index appropriately during its search. This is not hard to fix. But the _by_count() functions also have the property that they attempt to roll back all prior operations if the entire request cannot be satisfied, even though the rollback itself can encounter errors. It's not possible to provide transaction-like behavior at this level, and it's undesirable to have code that can only pretend to do that. Any users of these functions cannot know what the state of the system is in the error case. And the error paths are, to my knowledge, impossible to test without adding custom error injection code. Summary: * This code has not worked reliably since its introduction. * There is no evidence that it is used. * It contains questionable rollback behaviors in error paths which are difficult to test. So let's remove it. Fixes: `ac71380071` ("powerpc/pseries: Add CPU dlpar remove functionality") Fixes: `90edf184b9` ("powerpc/pseries: Add CPU dlpar add functionality") Fixes: `b015f6bc95` ("powerpc/pseries: Add cpu DLPAR support for drc-info property") Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210927201933.76786-4-nathanl@linux.ibm.com	2021-10-09 00:16:00 +11:00
Nathan Lynch	983f910174	powerpc/cpuhp: BUG -> WARN conversion in offline path If, due to bugs elsewhere, we get into unregister_cpu_online() with a CPU that isn't marked hotpluggable, we can emit a warning and return an appropriate error instead of crashing. Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210927201933.76786-3-nathanl@linux.ibm.com	2021-10-09 00:16:00 +11:00

... 7 8 9 10 11 ...

24645 Commits