linux

Author	SHA1	Message	Date
Michael Neuling	9c2d72d497	selftests/powerpc: Add perf breakpoint test This tests perf hardware breakpoints (ie PERF_TYPE_BREAKPOINT) on powerpc. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 21:16:44 +10:00
Michal Suchanek	a377514519	powerpc/64s: Enhance the information in cpu_show_spectre_v1() We now have barrier_nospec as mitigation so print it in cpu_show_spectre_v1() when enabled. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:46 +10:00
Michael Ellerman	51973a815c	powerpc/64: Use barrier_nospec in syscall entry Our syscall entry is done in assembly so patch in an explicit barrier_nospec. Based on a patch by Michal Suchanek. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:45 +10:00
Michael Ellerman	ddf35cf376	powerpc: Use barrier_nospec in copy_from_user() Based on the x86 commit doing the same. See commit `304ec1b050` ("x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec") and `b3bbfb3fb5` ("x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec") for more detail. In all cases we are ordering the load from the potentially user-controlled pointer vs a previous branch based on an access_ok() check or similar. Base on a patch from Michal Suchanek. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:45 +10:00
Michal Suchanek	cb3d6759a9	powerpc/64s: Enable barrier_nospec based on firmware settings Check what firmware told us and enable/disable the barrier_nospec as appropriate. We err on the side of enabling the barrier, as it's no-op on older systems, see the comment for more detail. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:45 +10:00
Michal Suchanek	815069ca57	powerpc/64s: Patch barrier_nospec in modules Note that unlike RFI which is patched only in kernel the nospec state reflects settings at the time the module was loaded. Iterating all modules and re-patching every time the settings change is not implemented. Based on lwsync patching. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:44 +10:00
Michal Suchanek	2eea7f067f	powerpc/64s: Add support for ori barrier_nospec patching Based on the RFI patching. This is required to be able to disable the speculation barrier. Only one barrier type is supported and it does nothing when the firmware does not enable it. Also re-patching modules is not supported So the only meaningful thing that can be done is patching out the speculation barrier at boot when the user says it is not wanted. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:44 +10:00
Michal Suchanek	a6b3964ad7	powerpc/64s: Add barrier_nospec A no-op form of ori (or immediate of 0 into r31 and the result stored in r31) has been re-tasked as a speculation barrier. The instruction only acts as a barrier on newer machines with appropriate firmware support. On older CPUs it remains a harmless no-op. Implement barrier_nospec using this instruction. mpe: The semantics of the instruction are believed to be that it prevents execution of subsequent instructions until preceding branches have been fully resolved and are no longer executing speculatively. There is no further documentation available at this time. Signed-off-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:44 +10:00
Michael Ellerman	7af76c5f23	powerpc/stacktrace: Update copyright This now has new code in it written by Nick and I, and switch to a SPDX tag. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>	2018-06-03 20:43:43 +10:00
Michael Ellerman	5cc05910f2	powerpc/64s: Wire up arch_trigger_cpumask_backtrace() This allows eg. the RCU stall detector, or the soft/hardlockup detectors to trigger a backtrace on all CPUs. We implement this by sending a "safe" NMI, which will actually only send an IPI. Unfortunately the generic code prints "NMI", so that's a little confusing but we can probably live with it. If one of the CPUs doesn't respond to the IPI, we then print some info from it's paca and do a backtrace based on its saved_r1. Example output: INFO: rcu_sched detected stalls on CPUs/tasks: 2-...0: (0 ticks this GP) idle=1be/1/4611686018427387904 softirq=1055/1055 fqs=25735 (detected by 4, t=58847 jiffies, g=58, c=57, q=1258) Sending NMI from CPU 4 to CPUs 2: CPU 2 didn't respond to backtrace IPI, inspecting paca. irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3623 (bash) Back trace of paca->saved_r1 (0xc0000000e1c83ba0) (possibly stale): Call Trace: [c0000000e1c83ba0] [0000000000000014] 0x14 (unreliable) [c0000000e1c83bc0] [c000000000765798] lkdtm_do_action+0x48/0x80 [c0000000e1c83bf0] [c000000000765a40] direct_entry+0x110/0x1b0 [c0000000e1c83c90] [c00000000058e650] full_proxy_write+0x90/0xe0 [c0000000e1c83ce0] [c0000000003aae3c] __vfs_write+0x6c/0x1f0 [c0000000e1c83d80] [c0000000003ab214] vfs_write+0xd4/0x240 [c0000000e1c83dd0] [c0000000003ab5cc] ksys_write+0x6c/0x110 [c0000000e1c83e30] [c00000000000b860] system_call+0x58/0x6c Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>	2018-06-03 20:43:43 +10:00
Michael Ellerman	6ba55716a2	powerpc/nmi: Add an API for sending "safe" NMIs Currently the options we have for sending NMIs are not necessarily safe, that is they can potentially interrupt a CPU in a non-recoverable region of code, meaning the kernel must then panic(). But we'd like to use smp_send_nmi_ipi() to do cross-CPU calls in situations where we don't want to risk a panic(), because it doesn't have the requirement that interrupts must be enabled like smp_call_function(). So add an API for the caller to indicate that it wants to use the NMI infrastructure, but doesn't want to do anything "unsafe". Currently that is implemented by not actually calling cause_nmi_ipi(), instead falling back to an IPI. In future we can pass the safe parameter down to cause_nmi_ipi() and the individual backends can potentially take it into account before deciding what to do. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>	2018-06-03 20:43:43 +10:00
Michael Ellerman	7b08729cb2	powerpc/64: Save stack pointer when we hard disable interrupts A CPU that gets stuck with interrupts hard disable can be difficult to debug, as on some platforms we have no way to interrupt the CPU to find out what it's doing. A stop-gap is to have the CPU save it's stack pointer (r1) in its paca when it hard disables interrupts. That way if we can't interrupt it, we can at least trace the stack based on where it last disabled interrupts. In some cases that will be total junk, but the stack trace code should handle that. In the simple case of a CPU that disable interrupts and then gets stuck in a loop, the stack trace should be informative. We could clear the saved stack pointer when we enable interrupts, but that loses information which could be useful if we have nothing else to go on. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com>	2018-06-03 20:43:42 +10:00
Michael Ellerman	3e3786801b	powerpc: Check address limit on user-mode return (TIF_FSCHECK) set_fs() sets the addr_limit, which is used in access_ok() to determine if an address is a user or kernel address. Some code paths use set_fs() to temporarily elevate the addr_limit so that kernel code can read/write kernel memory as if it were user memory. That is fine as long as the code can't ever return to userspace with the addr_limit still elevated. If that did happen, then userspace can read/write kernel memory as if it were user memory, eg. just with write(2). In case it's not clear, that is very bad. It has also happened in the past due to bugs. Commit `5ea0727b16` ("x86/syscalls: Check address limit on user-mode return") added a mechanism to check the addr_limit value before returning to userspace. Any call to set_fs() sets a thread flag, TIF_FSCHECK, and if we see that on the return to userspace we go out of line to check that the addr_limit value is not elevated. For further info see the above commit, as well as: https://lwn.net/Articles/722267/ https://bugs.chromium.org/p/project-zero/issues/detail?id=990 Verified to work on 64-bit Book3S using a POC that objdumps the system call handler, and a modified lkdtm_CORRUPT_USER_DS() that doesn't kill the caller. Before: $ sudo ./test-tif-fscheck ... 0000000000000000 <.data>: 0: e1 f7 8a 79 rldicl. r10,r12,30,63 4: 80 03 82 40 bne 0x384 8: 00 40 8a 71 andi. r10,r12,16384 c: 78 0b 2a 7c mr r10,r1 10: 10 fd 21 38 addi r1,r1,-752 14: 08 00 c2 41 beq- 0x1c 18: 58 09 2d e8 ld r1,2392(r13) 1c: 00 00 41 f9 std r10,0(r1) 20: 70 01 61 f9 std r11,368(r1) 24: 78 01 81 f9 std r12,376(r1) 28: 70 00 01 f8 std r0,112(r1) 2c: 78 00 41 f9 std r10,120(r1) 30: 20 00 82 41 beq 0x50 34: a6 42 4c 7d mftb r10 After: $ sudo ./test-tif-fscheck Killed And in dmesg: Invalid address limit on user-mode return WARNING: CPU: 1 PID: 3689 at ../include/linux/syscalls.h:260 do_notify_resume+0x140/0x170 ... NIP [c00000000001ee50] do_notify_resume+0x140/0x170 LR [c00000000001ee4c] do_notify_resume+0x13c/0x170 Call Trace: do_notify_resume+0x13c/0x170 (unreliable) ret_from_except_lite+0x70/0x74 Performance overhead is essentially zero in the usual case, because the bit is checked as part of the existing _TIF_USER_WORK_MASK check. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:42 +10:00
Michael Ellerman	ba0635fcbe	powerpc: Rename thread_struct.fs to addr_limit It's called 'fs' for historical reasons, it's named after the x86 'FS' register. But we don't have to use that name for the member of thread_struct, and in fact arch/x86 doesn't even call it 'fs' anymore. So rename it to 'addr_limit', which better reflects what it's used for, and is also the name used on other arches. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:42 +10:00
Al Viro	6bcdd2972b	powerpc/ptrace: Use copy_{from, to}_user() rather than open-coding In PPC_PTRACE_GETHWDBGINFO and PPC_PTRACE_SETHWDEBUG we do an access_ok() check and then __copy_{from,to}_user(). Instead we should just use copy_{from,to}_user() which does all that for us and is less error prone. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:41 +10:00
Sam Bobroff	20b3449714	powerpc/eeh: Refactor report functions The EEH report functions now share a fair bit of code around the start and end of each function. So factor out as much as possible, and move the traversal into a custom function. This also allows accurate debug to be generated more easily. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> [mpe: Format with clang-format] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:41 +10:00
Sam Bobroff	665012c573	powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER If a device without a driver is recovered via EEH, the flag EEH_DEV_NO_HANDLER is incorrectly left set on the device after recovery, because the test in eeh_report_resume() for the existence of a bound driver is done before the flag is cleared. If a driver is later bound, and EEH experienced again, some of the drivers EEH handers are not called. To correct this, clear the flag unconditionally after EEH processing is complete. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:41 +10:00
Sam Bobroff	010acfa1a7	powerpc/eeh: Introduce eeh_set_irq_state() To ease future refactoring, extract calls to eeh_enable_irq() and eeh_disable_irq() from the various report functions. This makes the report functions initial sequences more similar, as well as making the IRQ changes visible when reading eeh_handle_normal_event(). Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:40 +10:00
Sam Bobroff	47cc8c1cc2	powerpc/eeh: Introduce eeh_set_channel_state() To ease future refactoring, extract setting of the channel state from the report functions out into their own functions. This increases the amount of code that is identical across all of the report functions. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:40 +10:00
Sam Bobroff	e2b810d51b	powerpc/eeh: Introduce eeh_edev_actionable() The same test is done in every EEH report function, so factor it out. Since eeh_dev_removed() needs to be moved higher up in the file, simplify it a little while we're at it. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:40 +10:00
Sam Bobroff	309ed3a715	powerpc/eeh: Introduce eeh_for_each_pe() Add a for_each-style macro for iterating through PEs without the boilerplate required by a traversal function. eeh_pe_next() is now exported, as it is now used directly in place. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:39 +10:00
Sam Bobroff	30424e386a	powerpc/eeh: Clean up pci_ers_result handling As EEH event handling progresses, a cumulative result of type pci_ers_result is built up by (some of) the eeh_report_() functions using either: if (rc == PCI_ERS_RESULT_NEED_RESET) res = rc; if (res == PCI_ERS_RESULT_NONE) res = rc; or: if ((res == PCI_ERS_RESULT_NONE) \|\| (res == PCI_ERS_RESULT_RECOVERED)) res = rc; if (res == PCI_ERS_RESULT_DISCONNECT && rc == PCI_ERS_RESULT_NEED_RESET) res = rc; (Where res is the accumulator.) However, the intent is not immediately clear and the result in some situations is order dependent. Address this by assigning a priority to each result value, and always merging to the highest priority. This renders the intent clear, and provides a stable value for all orderings. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> [mpe: Minor formatting (clang-format)] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:39 +10:00
Sam Bobroff	2eae39f29b	powerpc/eeh: Add message when PE processing at parent To aid debugging, add a message to show when EEH processing for a PE will be done at the device's parent, rather than directly at the device. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:39 +10:00
Sam Bobroff	d6c4932fbf	powerpc/eeh: Strengthen types of eeh traversal functions The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse() both provide their first argument as void * but every single user casts it to the expected type. Change the type of the first parameter from void * to the appropriate type, and clean up all uses. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:38 +10:00
Sam Bobroff	a0bd54641b	powerpc/eeh: Remove unused eeh_pcid_name() Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:38 +10:00
Sam Bobroff	46d4be41b9	powerpc/eeh: Fix use-after-release of EEH driver Correct two cases where eeh_pcid_get() is used to reference the driver's module but the reference is dropped before the driver pointer is used. In eeh_rmv_device() also refactor a little so that only two calls to eeh_pcid_put() are needed, rather than three and the reference isn't taken at all if it wasn't needed. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:38 +10:00
Sam Bobroff	796b9f5b31	powerpc/eeh: Add final message for successful recovery Add a single log line at the end of successful EEH recovery, so that it's clear that event processing has finished. Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:37 +10:00
Anju T Sudhakar	25af86b2ae	powerpc/perf: Unregister thread-imc if core-imc not supported Since thread-imc internally use the core-imc hardware infrastructure and is depended on it, having thread-imc in the kernel in the absence of core-imc is trivial. Patch disables thread-imc, if core-imc is not registered. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:37 +10:00
Anju T Sudhakar	e7a8ac4338	powerpc/perf: Return appropriate value for unknown domain Return proper error code for unknown domain during IMC initialization. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:37 +10:00
Anju T Sudhakar	b41bb28b9e	powerpc/perf: Replace the direct return with goto statement Replace the direct return statement in imc_mem_init() with goto, to adhere to the kernel coding style. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:36 +10:00
Anju T Sudhakar	cb094fa5af	powerpc/perf: Rearrange memory freeing in imc init When any of the IMC (In-Memory Collection counter) devices fail to initialize, imc_common_mem_free() frees set of memory. In doing so, pmu_ptr pointer is also freed. But pmu_ptr pointer is used in subsequent function (imc_common_cpuhp_mem_free()) which is wrong. Patch here reorders the code to avoid such access. Also free the memory which is dynamically allocated during imc initialization, wherever required. Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:36 +10:00
YueHaibing	589b1f7e4b	powerpc/xics: Add missing of_node_put() in error path The device node obtained with of_find_compatible_node() should be released by calling of_node_put(). But it was not released when of_get_property() failed. Signed-off-by: YueHaibing <yuehaibing@huawei.com> [mpe: Invert the sense of the if so we only need one return path] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:35 +10:00
Fabio Estevam	c5cbde2df3	powerpc: cpm_gpio: Remove owner assignment from platform_driver Structure platform_driver does not need to set the owner field, as this will be populated by the driver core. Generated by scripts/coccinelle/api/platform_no_drv_owner.cocci. Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:35 +10:00
Russell Currey	8a792262f3	powerpc/xive: Remove (almost) unused macros The GETFIELD and SETFIELD macros in xive-regs.h aren't used except for a single instance of GETFIELD, so replace that and remove them. These macros are also defined in vas.h, so either those should be eventually replaced or the macros moved into bitops.h. Signed-off-by: Russell Currey <ruscur@russell.cc> [mpe: Rewrite the assignment to 'he' to avoid ffs() etc.] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:35 +10:00
Stewart Smith	447808bf50	hvc_opal: don't set tb_ticks_per_usec in udbg_init_opal_common() time_init() will set up tb_ticks_per_usec based on reality. time_init() is called after udbg_init_opal_common() during boot. from arch/powerpc/kernel/time.c: unsigned long tb_ticks_per_usec = 100; /* sane default */ Currently, all powernv systems have a timebase frequency of 512mhz (512000000/1000000 == 0x200) - although there's nothing written down anywhere that I can find saying that we couldn't make that different based on the requirements in the ISA. So, we've been (accidentally) thwacking the (currently) correct (for powernv at least) value for tb_ticks_per_usec earlier than we otherwise would have. The "sane default" seems to be adequate for our purposes between udbg_init_opal_common() and time_init() being called, and if it isn't, then we should probably be setting it somewhere that isn't hvc_opal.c! Signed-off-by: Stewart Smith <stewart@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:34 +10:00
Arnd Bergmann	34efabe418	powerpc: remove unused to_tm() helper to_tm() is now completely unused, the only reference being in the _dump_time() helper that is also unused. This removes both, leaving the rest of the powerpc RTC code y2038 safe to as far as the hardware supports. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:34 +10:00
Arnd Bergmann	5235afa89a	powerpc: use time64_t in update_persistent_clock update_persistent_clock() is deprecated because it suffers from overflow in 2038 on 32-bit architectures. This changes powerpc to use the update_persistent_clock64() replacement, and to pass down 64-bit timestamps consistently. This is now simpler, as we no longer have to worry about the offset numbers in tm_year and tm_mon that are different between the Linux conventions and RTAS. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:34 +10:00
Arnd Bergmann	5bfd643583	powerpc: use time64_t in read_persistent_clock Looking through the remaining users of the deprecated mktime() function, I found the powerpc rtc handlers, which use it in place of rtc_tm_to_time64(). To clean this up, I'm changing over the read_persistent_clock() function to the read_persistent_clock64() variant, and change all the platform specific handlers along with it. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:33 +10:00
Arnd Bergmann	2dc20f454d	powerpc: rtas: clean up time handling The to_tm() helper function operates on a signed integer for the time, so it will suffer from overflow in 2038, even on 64-bit kernels. Rather than fix that function, this replaces its use in the rtas procfs implementation with the standard rtc_time64_to_tm() helper that is very similar but is not affected by the overflow. In order to actually support long times, the parser function gets changed to 64-bit user input and output as well. Note that the tm_mon and tm_year representation is slightly different, so we have to manually add an offset here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:33 +10:00
Arnd Bergmann	6e8cef384a	powerpc: always enable RTC_LIB In order to use the rtc_tm_to_time64() and rtc_time64_to_tm() helper functions in later patches, we have to ensure that CONFIG_RTC_LIB is always built-in. Note that this symbol only controls a couple of helper functions, not the actual RTC subsystem, which remains optional and is enabled with CONFIG_RTC_CLASS. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:33 +10:00
Olof Johansson	eff06ef089	powerpc/pasemi: Set PCI_SCAN_ALL_PCI_DEVS Needed on Amiga X1000 with SB600. Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Signed-off-by: Olof Johansson <olof@lixom.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:43:32 +10:00
Aneesh Kumar K.V	a5db5060e0	powerpc/mm/hash: hard disable irq in the SLB insert path When inserting SLB entries for EA above 512TB, we need to hard disable irq. This will make sure we don't take a PMU interrupt that can possibly touch user space address via a stack dump. To prevent this, we need to hard disable the interrupt. Also add a comment explaining why we don't need context synchronizing isync with slbmte. Fixes: `f384796c4` ("powerpc/mm: Add support for handling > 512TB address in SLB miss") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:38 +10:00
Aneesh Kumar K.V	ed515b6898	powerpc/mm/hugetlb: Update hugetlb related locks With split pmd page table lock enabled, we don't use mm->page_table_lock when updating pmd entries. This patch update hugetlb path to use the right lock when inserting huge page directory entries into page table. ex: if we are using hugepd and inserting hugepd entry at the pmd level, we use pmd_lockptr, which based on config can be split pmd lock. For update huge page directory entries itself we use mm->page_table_lock. We do have a helper huge_pte_lockptr() for that. Fixes: `675d99529` ("powerpc/book3s64: Enable split pmd ptlock") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:37 +10:00
Aneesh Kumar K.V	91d0697188	powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch Currently we do not have an isync, or any other context synchronizing instruction prior to the slbie/slbmte in _switch() that updates the SLB entry for the kernel stack. However that is not correct as outlined in the ISA. From Power ISA Version 3.0B, Book III, Chapter 11, page 1133: "Changing the contents of ... the contents of SLB entries ... can have the side effect of altering the context in which data addresses and instruction addresses are interpreted, and in which instructions are executed and data accesses are performed. ... These side effects need not occur in program order, and therefore may require explicit synchronization by software. ... The synchronizing instruction before the context-altering instruction ensures that all instructions up to and including that synchronizing instruction are fetched and executed in the context that existed before the alteration." And page 1136: "For data accesses, the context synchronizing instruction before the slbie, slbieg, slbia, slbmte, tlbie, or tlbiel instruction ensures that all preceding instructions that access data storage have completed to a point at which they have reported all exceptions they will cause." We're not aware of any bugs caused by this, but it should be fixed regardless. Add the missing isync when updating kernel stack SLB entry. Cc: stable@vger.kernel.org Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Flesh out change log with more ISA text & explanation] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:37 +10:00
Nicholas Piggin	926bc2f100	powerpc/64s: Fix compiler store ordering to SLB shadow area The stores to update the SLB shadow area must be made as they appear in the C code, so that the hypervisor does not see an entry with mismatched vsid and esid. Use WRITE_ONCE for this. GCC has been observed to elide the first store to esid in the update, which means that if the hypervisor interrupts the guest after storing to vsid, it could see an entry with old esid and new vsid, which may possibly result in memory corruption. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:37 +10:00
Nicholas Piggin	0cef77c779	powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask When a single-threaded process has a non-local mm_cpumask, try to use that point to flush the TLBs out of other CPUs in the cpumask. An IPI is used for clearing remote CPUs for a few reasons: - An IPI can end lazy TLB use of the mm, which is required to prevent TLB entries being created on the remote CPU. The alternative is to drop lazy TLB switching completely, which costs 7.5% in a context switch ping-pong test betwee a process and kernel idle thread. - An IPI can have remote CPUs flush the entire PID, but the local CPU can flush a specific VA. tlbie would require over-flushing of the local CPU (where the process is running). - A single threaded process that is migrated to a different CPU is likely to have a relatively small mm_cpumask, so IPI is reasonable. No other thread can concurrently switch to this mm, because it must have been given a reference to mm_users by the current thread before it can use_mm. mm_users can be asynchronously incremented (by mm_activate or mmget_not_zero), but those users must use remote mm access and can't use_mm or access user address space. Existing code makes the this assumption already, for example sparc64 has reset mm_cpumask using this condition since the start of history, see arch/sparc/kernel/smp_64.c. This reduces tlbies for a kernel compile workload from 0.90M to 0.12M, tlbiels are increased significantly due to the PID flushing for the cleaning up remote CPUs, and increased local flushes (PID flushes take 128 tlbiels vs 1 tlbie). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:36 +10:00
Nicholas Piggin	85bcfaf69c	powerpc/64s/radix: optimise pte_update Implementing pte_update with pte_xchg (which uses cmpxchg) is inefficient. A single larx/stcx. works fine, no need for the less efficient cmpxchg sequence. Then remove the memory barriers from the operation. There is a requirement for TLB flushing to load mm_cpumask after the store that reduces pte permissions, which is moved into the TLB flush code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:36 +10:00
Nicholas Piggin	f1cb8f9beb	powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags The ISA suggests ptesync after setting a pte, to prevent a table walk initiated by a subsequent access from missing that store and causing a spurious fault. This is an architectual allowance that allows an implementation's page table walker to be incoherent with the store queue. However there is no correctness problem in taking a spurious fault in userspace -- the kernel copes with these at any time, so the updated pte will be found eventually. Spurious kernel faults on vmap memory must be avoided, so a ptesync is put into flush_cache_vmap. On POWER9 so far I have not found a measurable window where this can result in more minor faults, so as an optimisation, remove the costly ptesync from pte updates. If an implementation benefits from ptesync, it would be better to add it back in update_mmu_cache, so it's not done for things like fork(2). fork --fork --exec benchmark improved 5.2% (12400->13100). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:36 +10:00
Nicholas Piggin	68662f85f3	powerpc/64s/radix: prefetch user address in update_mmu_cache Prefetch the faulting address in update_mmu_cache to give the page table walker perhaps 100 cycles head start as locks are dropped and the interrupt completed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:35 +10:00
Nicholas Piggin	f569bd94ef	powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case This matches other architectures, when we know there will be no further accesses to the address (e.g., for teardown), page table entries can be cleared non-atomically. The comments about NMMU are bogus: all MMU notifiers (including NMMU) are released at this point, with their TLBs flushed. An NMMU access at this point would be a bug. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2018-06-03 20:40:35 +10:00

1 2 3 4 5 ...

752980 Commits