linux

Author	SHA1	Message	Date
Nicholas Piggin	d82b392d9b	KVM: PPC: Book3S HV Nested: Fix TM softpatch HFAC interrupt emulation Have the TM softpatch emulation code set up the HFAC interrupt and return -1 in case an instruction was executed with HFSCR bits clear, and have the interrupt exit handler fall through to the HFAC handler. When the L0 is running a nested guest, this ensures the HFAC interrupt is correctly passed up to the L1. The "direct guest" exit handler will turn these into PROGILL program interrupts so functionality in practice will be unchanged. But it's possible an L1 would want to handle these in a different way. Also rearrange the FAC interrupt emulation code to match the HFAC format while here (mainly, adding the FSCR_INTR_CAUSE mask). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210811160134.904987-5-npiggin@gmail.com	2021-08-25 16:37:17 +10:00
Nicholas Piggin	4782e0cd0d	KVM: PPC: Book3S HV P9: Fixes for TM softpatch interrupt NIP The softpatch interrupt sets HSRR0 to the faulting instruction +4, so it should subtract 4 for the faulting instruction address in the case it is a TM softpatch interrupt (the instruction was not executed) and it was not emulated. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210811160134.904987-4-npiggin@gmail.com	2021-08-25 16:37:17 +10:00
Nicholas Piggin	daac40e8d7	KVM: PPC: Book3S HV: Remove TM emulation from POWER7/8 path TM fake-suspend emulation is only used by POWER9. Remove it from the old code path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210811160134.904987-3-npiggin@gmail.com	2021-08-25 16:37:17 +10:00
Nicholas Piggin	fd42b7b09c	KVM: PPC: Book3S HV: Initialise vcpu MSR with MSR_ME It is possible to create a VCPU without setting the MSR before running it, which results in a warning in kvmhv_vcpu_entry_p9() that MSR_ME is not set. This is pretty harmless because the MSR_ME bit is added to HSRR1 before HRFID to guest, and a normal qemu guest doesn't hit it. Initialise the vcpu MSR with MSR_ME set. Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210811160134.904987-2-npiggin@gmail.com	2021-08-25 16:37:17 +10:00
Fabiano Rosas	0eb596f1e6	KVM: PPC: Book3S HV: Stop exporting symbols from book3s_64_mmu_radix The book3s_64_mmu_radix.o object is not part of the KVM builtins and all the callers of the exported symbols are in the same kvm-hv.ko module so we should not need to export any symbols. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805212616.2641017-4-farosas@linux.ibm.com	2021-08-24 23:14:06 +10:00
Fabiano Rosas	c232461c0c	KVM: PPC: Book3S HV: Add sanity check to copy_tofrom_guest Both paths into __kvmhv_copy_tofrom_guest_radix ensure that we arrive with an effective address that is smaller than our total addressable space and addresses quadrant 0. - The H_COPY_TOFROM_GUEST hypercall path rejects the call with H_PARAMETER if the effective address has any of the twelve most significant bits set. - The kvmhv_copy_tofrom_guest_radix path clears the top twelve bits before calling the internal function. Although the callers make sure that the effective address is sane, any future use of the function is exposed to a programming error, so add a sanity check. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805212616.2641017-3-farosas@linux.ibm.com	2021-08-24 23:14:06 +10:00
Fabiano Rosas	5d7d6dac8f	KVM: PPC: Book3S HV: Fix copy_tofrom_guest routines The __kvmhv_copy_tofrom_guest_radix function was introduced along with nested HV guest support. It uses the platform's Radix MMU quadrants to provide a nested hypervisor with fast access to its nested guests memory (H_COPY_TOFROM_GUEST hypercall). It has also since been added as a fast path for the kvmppc_ld/st routines which are used during instruction emulation. The commit `def0bfdbd6` ("powerpc: use probe_user_read() and probe_user_write()") changed the low level copy function from raw_copy_from_user to probe_user_read, which adds a check to access_ok. In powerpc that is: static inline bool __access_ok(unsigned long addr, unsigned long size) { return addr < TASK_SIZE_MAX && size <= TASK_SIZE_MAX - addr; } and TASK_SIZE_MAX is 0x0010000000000000UL for 64-bit, which means that setting the two MSBs of the effective address (which correspond to the quadrant) now cause access_ok to reject the access. This was not caught earlier because the most common code path via kvmppc_ld/st contains a fallback (kvm_read_guest) that is likely to succeed for L1 guests. For nested guests there is no fallback. Another issue is that probe_user_read (now __copy_from_user_nofault) does not return the number of bytes not copied in case of failure, so the destination memory is not being cleared anymore in kvmhv_copy_from_guest_radix: ret = kvmhv_copy_tofrom_guest_radix(vcpu, eaddr, to, NULL, n); if (ret > 0) <-- always false! memset(to + (n - ret), 0, ret); This patch fixes both issues by skipping access_ok and open-coding the low level __copy_to/from_user_inatomic. Fixes: `def0bfdbd6` ("powerpc: use probe_user_read() and probe_user_write()") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805212616.2641017-2-farosas@linux.ibm.com	2021-08-24 23:14:06 +10:00
Jing Zhang	8ccba534a1	KVM: stats: Add halt polling related histogram stats Add three log histogram stats to record the distribution of time spent on successful polling, failed polling and VCPU wait. halt_poll_success_hist: Distribution of spent time for a successful poll. halt_poll_fail_hist: Distribution of spent time for a failed poll. halt_wait_hist: Distribution of time a VCPU has spent on waiting. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-6-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-20 16:06:33 -04:00
Jing Zhang	87bcc5fa09	KVM: stats: Add halt_wait_ns stats for all architectures Add simple stats halt_wait_ns to record the time a VCPU has spent on waiting for all architectures (not just powerpc). Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-5-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-20 16:06:33 -04:00
Jing Zhang	f95937ccf5	KVM: stats: Support linear and logarithmic histogram statistics Add new types of KVM stats, linear and logarithmic histogram. Histogram are very useful for observing the value distribution of time or size related stats. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210802165633.1866976-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-20 16:06:32 -04:00
Cédric Le Goater	b352ddae7b	KVM: PPC: Book3S PR: Remove unused variable This fixes a compile error with W=1. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210819125656.14498-5-clg@kaod.org	2021-08-20 22:17:41 +10:00
Cédric Le Goater	cb53a93e33	KVM: PPC: Book3S PR: Declare kvmppc_handle_exit_pr() This fixes a compile error with W=1. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210819125656.14498-4-clg@kaod.org	2021-08-20 22:17:38 +10:00
Lukas Bulwahn	c26d4c5d4f	powerpc/kvm: Remove obsolete and unneeded select Commit `a278e7ea60` ("powerpc: Fix compile issue with force DAWR") selects the non-existing config PPC_DAWR_FORCE_ENABLE for config KVM_BOOK3S_64_HANDLER. As this commit also introduces a config PPC_DAWR and this config PPC_DAWR is selected with PPC if PPC64, there is no need for any further select in the KVM_BOOK3S_64_HANDLER. Remove an obsolete and unneeded select in config KVM_BOOK3S_64_HANDLER. The issue was identified with ./scripts/checkkconfigsymbols.py. Fixes: `a278e7ea60` ("powerpc: Fix compile issue with force DAWR") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210819113954.17515-2-lukas.bulwahn@gmail.com	2021-08-20 12:37:18 +10:00
Aneesh Kumar K.V	dbf77fed8b	powerpc: rename powerpc_debugfs_root to arch_debugfs_dir No functional change in this patch. arch_debugfs_dir is the generic kernel name declared in linux/debugfs.h for arch-specific debugfs directory. Architectures like x86/s390 already use the name. Rename powerpc specific powerpc_debugfs_root to arch_debugfs_dir. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com	2021-08-13 22:04:26 +10:00
Cédric Le Goater	f5af0a9787	KVM: PPC: Book3S HV: XIVE: Add support for automatic save-restore On P10, the feature doing an automatic "save & restore" of a VCPU interrupt context is set by default in OPAL. When a VP context is pulled out, the state of the interrupt registers are saved by the XIVE interrupt controller under the internal NVP structure representing the VP. This saves a costly store/load in guest entries and exits. If OPAL advertises the "save & restore" feature in the device tree, it should also have set the 'H' bit in the CAM line. Check that when vCPUs are connected to their ICP in KVM before going any further. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210720134209.256133-3-clg@kaod.org	2021-08-10 23:15:02 +10:00
Cédric Le Goater	b68c6646cc	KVM: PPC: Book3S HV: XIVE: Add a 'flags' field Use it to hold platform specific features. P9 DD2 introduced single-escalation support. P10 will add others. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210720134209.256133-2-clg@kaod.org	2021-08-10 23:15:01 +10:00
Cédric Le Goater	1753081f2d	KVM: PPC: Book3S HV: XICS: Fix mapping of passthrough interrupts PCI MSIs now live in an MSI domain but the underlying calls, which will EOI the interrupt in real mode, need an HW IRQ number mapped in the XICS IRQ domain. Grab it there. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-31-clg@kaod.org	2021-08-10 23:15:01 +10:00
Cédric Le Goater	c325712b5f	powerpc/powernv/pci: Rework pnv_opal_pci_msi_eoi() pnv_opal_pci_msi_eoi() is called from KVM to EOI passthrough interrupts when in real mode. Adding MSI domain broke the hack using the 'ioda.irq_chip' field to deduce the owning PHB. Fix that by using the IRQ chip data in the MSI domain. The 'ioda.irq_chip' field is now unused and could be removed from the pnv_phb struct. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-30-clg@kaod.org	2021-08-10 23:15:01 +10:00
Cédric Le Goater	51be9e51a8	KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the underlying calls handling the passthrough of the interrupt in the guest need a number in the XIVE IRQ domain. Use the IRQ data mapped in the XIVE IRQ domain and not the one in the PCI-MSI domain. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-16-clg@kaod.org	2021-08-10 23:14:59 +10:00
Cédric Le Goater	e5e78b1511	KVM: PPC: Book3S HV: XIVE: Change interface of passthrough interrupt routines The routine kvmppc_set_passthru_irq() calls kvmppc_xive_set_mapped() and kvmppc_xive_clr_mapped() with an IRQ descriptor. Use directly the host IRQ number to remove a useless conversion. Add some debug. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-15-clg@kaod.org	2021-08-10 23:14:59 +10:00
Cédric Le Goater	ba418a0278	KVM: PPC: Book3S HV: Use the new IRQ chip to detect passthrough interrupts Passthrough PCI MSI interrupts are detected in KVM with a check on a specific EOI handler (P8) or on XIVE (P9). We can now check the PCI-MSI IRQ chip which is cleaner. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210701132750.1475580-14-clg@kaod.org	2021-08-10 23:14:58 +10:00
Alexey Kardashevskiy	2ac78e0c00	KVM: PPC: Use arch_get_random_seed_long instead of powernv variant The powernv_get_random_long() does not work in nested KVM (which is pseries) and produces a crash when accessing in_be64(rng->regs) in powernv_get_random_long(). This replaces powernv_get_random_long with the ppc_md machine hook wrapper. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210805075649.2086567-1-aik@ozlabs.ru	2021-08-10 23:14:56 +10:00
Sebastian Andrzej Siewior	5ae36401ca	powerpc: Replace deprecated CPU-hotplug functions. The functions get_online_cpus() and put_online_cpus() have been deprecated during the CPU hotplug rework. They map directly to cpus_read_lock() and cpus_read_unlock(). Replace deprecated CPU-hotplug functions with the official version. The behavior remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210803141621.780504-4-bigeasy@linutronix.de	2021-08-10 23:14:56 +10:00
David Matlack	0f22af940d	KVM: Move last_used_slot logic out of search_memslots Make search_memslots unconditionally search all memslots and move the last_used_slot logic up one level to __gfn_to_memslot. This is in preparation for introducing a per-vCPU last_used_slot. As part of this change convert existing callers of search_memslots to __gfn_to_memslot to avoid making any functional changes. Signed-off-by: David Matlack <dmatlack@google.com> Message-Id: <20210804222844.1419481-3-dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-08-06 07:52:28 -04:00
Nicholas Piggin	d9c57d3ed5	KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state The H_ENTER_NESTED hypercall is handled by the L0, and it is a request by the L1 to switch the context of the vCPU over to that of its L2 guest, and return with an interrupt indication. The L1 is responsible for switching some registers to guest context, and the L0 switches others (including all the hypervisor privileged state). If the L2 MSR has TM active, then the L1 is responsible for recheckpointing the L2 TM state. Then the L1 exits to L0 via the H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit, and then it recheckpoints the TM state as part of the nested entry and finally HRFIDs into the L2 with TM active MSR. Not efficient, but about the simplest approach for something that's horrendously complicated. Problems arise if the L1 exits to the L0 with a TM state which does not match the L2 TM state being requested. For example if the L1 is transactional but the L2 MSR is non-transactional, or vice versa. The L0's HRFID can take a TM Bad Thing interrupt and crash. Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then ensuring that if the L1 is suspended then the L2 must have TM active, and if the L1 is not suspended then the L2 must not have TM active. Fixes: `360cae3137` ("KVM: PPC: Book3S HV: Nested guest entry via hypercall") Cc: stable@vger.kernel.org # v4.20+ Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Acked-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2021-07-23 16:19:38 +10:00
Nicholas Piggin	f62f3c2064	KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on the rtas_args.nargs that was provided by the guest. That guest nargs value is not range checked, so the guest can cause the host rets pointer to be pointed outside the args array. The individual rtas function handlers check the nargs and nrets values to ensure they are correct, but if they are not, the handlers store a -3 (0xfffffffd) failure indication in rets[0] which corrupts host memory. Fix this by testing up front whether the guest supplied nargs and nret would exceed the array size, and fail the hcall directly without storing a failure indication to rets[0]. Also expand on a comment about why we kill the guest and try not to return errors directly if we have a valid rets[0] pointer. Fixes: `8e591cb720` ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls") Cc: stable@vger.kernel.org # v3.10+ Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>	2021-07-23 16:14:31 +10:00
Nicholas Piggin	bc4188a2f5	KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak vcpu_put is not called if the user copy fails. This can result in preempt notifier corruption and crashes, among other issues. Fixes: `b3cebfe8c1` ("KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210716024310.164448-2-npiggin@gmail.com	2021-07-17 14:33:18 +10:00
Nicholas Piggin	bd31ecf44b	KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash When running CPU_FTR_P9_TM_HV_ASSIST, HFSCR[TM] is set for the guest even if the host has CONFIG_TRANSACTIONAL_MEM=n, which causes it to be unprepared to handle guest exits while transactional. Normal guests don't have a problem because the HTM capability will not be advertised, but a rogue or buggy one could crash the host. Fixes: `4bb3c7a020` ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210716024310.164448-1-npiggin@gmail.com	2021-07-17 14:33:17 +10:00
Nicholas Piggin	e44fbdb680	KVM: PPC: Book3S HV P9: Fix guest TM support The conversion to C introduced several bugs in TM handling that can cause host crashes with TM bad thing interrupts. Mostly just simple typos or missed logic in the conversion that got through due to my not testing TM in the guest sufficiently. - Early TM emulation for the softpatch interrupt should be done if fake suspend mode is _not_ active. - Early TM emulation wants to return immediately to the guest so as to not doom transactions unnecessarily. - And if exiting from the guest, the host MSR should include the TM[S] bit if the guest was T/S, before it is treclaimed. After this fix, all the TM selftests pass when running on a P9 processor that implements TM with softpatch interrupt. Fixes: `89d35b2391` ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210712013650.376325-1-npiggin@gmail.com	2021-07-15 21:53:37 +10:00
Linus Torvalds	019b3fd94b	powerpc updates for 5.14 - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, Zhen Lei. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmDfFS4THG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFxHEAC88NJ+Gz87LiTQFt6QjhziBaJUd+sY uqADPRROr4P50O8PjYZbMi2qbXzOlLkZO4wJWX7jpZ1F9KmbPNqY2shD8h4ahyge F/uqzBW1FXBJfnDEKdU2MzalkeTP+dwxLZyouUamjDCGNLFjOV4x/Fft5otOdXjO k9uO6yoGyOkWYzjC+Y/irNPlIDDByB/+bD92Cb52Y2mXMDDEnx4JzbtkeJW+8udT Sjn3bWzeL+dz5GehjMKwK4+SptNiyQGOgM8FwtnKUMvgzxv04DqCGjr9YC12L2Z7 VoFZc4GzVgtf8DZg4fJ3KG5aG2nH3Tui7jc9lUckdrxixDAZw5wSG7CQ39gFb/5+ 7A4fEJk4Z3h5llibwxAZrC7wV8ZDDXn8oRFzRcOJjfxYaD+ohOOyWHIebwkdiXYx nfYI7sBcScDLXeBvHtDra2GJpbFSpVL3S/QNhhi1vKVNrFSyAgbAybcVL2xPLZ6+ 8Mh7A8xt+hf2bo9AXuYJDo9mwXWfg1093d0kT+AslcRhZioBk18c2AiZLIz0FzuL Ua/e5FPb99x9LSdcZHvaAXBoHT2iTgDyCyDa3gkIesyuRX6ggHoFcVQuvdDcbJ9d H8LK+Tahy1Y+E5b6KdtU8mDEGE+QG+CWLnwQ6YSCaL/MYgaFzNa32Jdj1fmztSBC cttP43kHZ7ljTw== =zo4d -----END PGP SIGNATURE----- Merge tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A big series refactoring parts of our KVM code, and converting some to C. - Support for ARCH_HAS_SET_MEMORY, and ARCH_HAS_STRICT_MODULE_RWX on some CPUs. - Support for the Microwatt soft-core. - Optimisations to our interrupt return path on 64-bit. - Support for userspace access to the NX GZIP accelerator on PowerVM on Power10. - Enable KUAP and KUEP by default on 32-bit Book3S CPUs. - Other smaller features, fixes & cleanups. Thanks to: Andy Shevchenko, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Baokun Li, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Daniel Axtens, Daniel Henrique Barboza, Finn Thain, Geoff Levand, Haren Myneni, Jason Wang, Jiapeng Chong, Joel Stanley, Jordan Niethe, Kajol Jain, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nick Desaulniers, Paul Mackerras, Russell Currey, Sathvika Vasireddy, Shaokun Zhang, Stephen Rothwell, Sudeep Holla, Suraj Jitindar Singh, Tom Rix, Vaibhav Jain, YueHaibing, Zhang Jianhua, and Zhen Lei. * tag 'powerpc-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (218 commits) powerpc: Only build restart_table.c for 64s powerpc/64s: move ret_from_fork etc above __end_soft_masked powerpc/64s/interrupt: clean up interrupt return labels powerpc/64/interrupt: add missing kprobe annotations on interrupt exit symbols powerpc/64: enable MSR[EE] in irq replay pt_regs powerpc/64s/interrupt: preserve regs->softe for NMI interrupts powerpc/64s: add a table of implicit soft-masked addresses powerpc/64e: remove implicit soft-masking and interrupt exit restart logic powerpc/64e: fix CONFIG_RELOCATABLE build warnings powerpc/64s: fix hash page fault interrupt handler powerpc/4xx: Fix setup_kuep() on SMP powerpc/32s: Fix setup_{kuap/kuep}() on SMP powerpc/interrupt: Use names in check_return_regs_valid() powerpc/interrupt: Also use exit_must_hard_disable() on PPC32 powerpc/sysfs: Replace sizeof(arr)/sizeof(arr[0]) with ARRAY_SIZE powerpc/ptrace: Refactor regs_set_return_{msr/ip} powerpc/ptrace: Move set_return_regs_changed() before regs_set_return_{msr/ip} powerpc/stacktrace: Fix spurious "stale" traces in raise_backtrace_ipi() powerpc/pseries/vas: Include irqdomain.h powerpc: mark local variables around longjmp as volatile ...	2021-07-02 12:54:34 -07:00
Linus Torvalds	65090f30ab	Merge branch 'akpm' (patches from Andrew) Merge misc updates from Andrew Morton: "191 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, kernel/watchdog, and mm (gup, pagealloc, slab, slub, kmemleak, dax, debug, pagecache, gup, swap, memcg, pagemap, mprotect, bootmem, dma, tracing, vmalloc, kasan, initialization, pagealloc, and memory-failure)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (191 commits) mm,hwpoison: make get_hwpoison_page() call get_any_page() mm,hwpoison: send SIGBUS with error virutal address mm/page_alloc: split pcp->high across all online CPUs for cpuless nodes mm/page_alloc: allow high-order pages to be stored on the per-cpu lists mm: replace CONFIG_FLAT_NODE_MEM_MAP with CONFIG_FLATMEM mm: replace CONFIG_NEED_MULTIPLE_NODES with CONFIG_NUMA docs: remove description of DISCONTIGMEM arch, mm: remove stale mentions of DISCONIGMEM mm: remove CONFIG_DISCONTIGMEM m68k: remove support for DISCONTIGMEM arc: remove support for DISCONTIGMEM arc: update comment about HIGHMEM implementation alpha: remove DISCONTIGMEM and NUMA mm/page_alloc: move free_the_page mm/page_alloc: fix counting of managed_pages mm/page_alloc: improve memmap_pages dbg msg mm: drop SECTION_SHIFT in code comments mm/page_alloc: introduce vm.percpu_pagelist_high_fraction mm/page_alloc: limit the number of pages on PCP lists when reclaim is active mm/page_alloc: scale the number of pages that are batch freed ...	2021-06-29 17:29:11 -07:00
Linus Torvalds	21edf50948	Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDbIg8THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYobZNEAC2wTq3Ishk026va7g5mbQVSvAQyf8G 0msmgJ48lJWVL9a6JUogNcCO7sZCTcAy4CYbuHI6kz1fGZZnNWSCrtEz0rFNAdWE WVR2k8ExR2R73vJm+K50WUMMj8YsefRnIFXWlJdTp+pksr3TZ7Lo70taGUK/6tMo aL0dqvnf7Vb3LG0iIkaHWLF4HnyK/UGqB+121rlL4UhI1/g+3EUxNWNcY5eg/dmc Ym73U1uDsjydp3/3jm8v8NYNtwCDGpujZZc/88RFLjP6PMpF1S9JUvDEt+LHJi0a cdS3RreB78HYXpLg5NtDFJwIegRMLSitvCGPBjHvWBzbifkMsA2zWIb6Cs8VkYys vuPoEGZ0ol+SWvcnSh5Xy36nyr4iGIBhQql47UAaqelSxsYPjvCCSD4yJV3k8hnC ZuDscOekXUMn75qZR0quNdi1SkgKpGZxK73QFbuW3Apl5EgArVai6kq0rbl6zlx6 ACy0SEcevhOcpU6WpqDgrmUBgFr+M8zina8edRELgiFEuWT6pYxKwrN3pT4U5djO e5V3YuNzzwzvtUoXN4AiTlT8gwRiGfgeiEvHpvZBXPNvk5ffS6XzPiV81ZMWiBkb ReoCbqME3PKoxj1VAHJvVXHbcjiPIJeCRdV+5vQSNh1SPSQOmEdWyJtNUDrSkoym QkKKY5jrOhPhlQ== =FIKh -----END PGP SIGNATURE----- Merge tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for the interrupt subsystem: Core changes: - Cleanup and simplification of common code to invoke the low level interrupt flow handlers when this invocation requires irqdomain resolution. Add the necessary core infrastructure. - Provide a proper interface for modular PMU drivers to set the interrupt affinity. - Add a request flag which allows to exclude interrupts from spurious interrupt detection. Useful especially for IPI handlers which always return IRQ_HANDLED which turns the spurious interrupt detection into a pointless waste of CPU cycles. Driver changes: - Bulk convert interrupt chip drivers to the new irqdomain low level flow handler invocation mechanism. - Add device tree bindings for the Renesas R-Car M3-W+ SoC - Enable modular build of the Qualcomm PDC driver - The usual small fixes and improvements" * tag 'irq-core-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) dt-bindings: interrupt-controller: arm,gic-v3: Describe GICv3 optional properties irqchip: gic-pm: Remove redundant error log of clock bulk irqchip/sun4i: Remove unnecessary oom message irqchip/irq-imx-gpcv2: Remove unnecessary oom message irqchip/imgpdc: Remove unnecessary oom message irqchip/gic-v3-its: Remove unnecessary oom message irqchip/gic-v2m: Remove unnecessary oom message irqchip/exynos-combiner: Remove unnecessary oom message irqchip: Bulk conversion to generic_handle_domain_irq() genirq: Move non-irqdomain handle_domain_irq() handling into ARM's handle_IRQ() genirq: Add generic_handle_domain_irq() helper irqchip/nvic: Convert from handle_IRQ() to handle_domain_irq() irqdesc: Fix __handle_domain_irq() comment genirq: Use irq_resolve_mapping() to implement __handle_domain_irq() and co irqdomain: Introduce irq_resolve_mapping() irqdomain: Protect the linear revmap with RCU irqdomain: Cache irq_data instead of a virq number in the revmap irqdomain: Use struct_size() helper when allocating irqdomain irqdomain: Make normal and nomap irqdomains exclusive powerpc: Move the use of irq_domain_add_nomap() behind a config option ...	2021-06-29 12:25:04 -07:00
Liam Howlett	900c83f88a	arch/powerpc/kvm/book3s: use vma_lookup() in kvmppc_hv_setup_htab_rma() Using vma_lookup() removes the requirement to check if the address is within the returned vma. The code is easier to understand and more compact. Link: https://lkml.kernel.org/r/20210521174745.2219620-7-Liam.Howlett@Oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-29 10:53:51 -07:00
Liam Howlett	27a14d287e	arch/powerpc/kvm/book3s_hv_uvmem: use vma_lookup() instead of find_vma_intersection() vma_lookup() finds the vma of a specific address with a cleaner interface and is more readable. Link: https://lkml.kernel.org/r/20210521174745.2219620-6-Liam.Howlett@Oracle.com Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Davidlohr Bueso <dbueso@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-29 10:53:51 -07:00
Linus Torvalds	36824f198c	ARM: - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes PPC: - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes S390: - new HW facilities for guests - make inline assembly more robust with KASAN and co x86: - Allow userspace to handle emulation errors (unknown instructions) - Lazy allocation of the rmap (host physical -> guest physical address) - Support for virtualizing TSC scaling on VMX machines - Optimizations to avoid shattering huge pages at the beginning of live migration - Support for initializing the PDPTRs without loading them from memory - Many TLB flushing cleanups - Refuse to load if two-stage paging is available but NX is not (this has been a requirement in practice for over a year) - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from CR0/CR4/EFER, using the MMU mode everywhere once it is computed from the CPU registers - Use PM notifier to notify the guest about host suspend or hibernate - Support for passing arguments to Hyper-V hypercalls using XMM registers - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on AMD processors - Hide Hyper-V hypercalls that are not included in the guest CPUID - Fixes for live migration of virtual machines that use the Hyper-V "enlightened VMCS" optimization of nested virtualization - Bugfixes (not many) Generic: - Support for retrieving statistics without debugfs - Cleanups for the KVM selftests API -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmDV9UYUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOIRgf/XX8fKLh24RnTOs2ldIu2AfRGVrT4 QMrr8MxhmtukBAszk2xKvBt8/6gkUjdaIC3xqEnVjxaDaUvZaEtP7CQlF5JV45rn iv1zyxUKucXrnIOr+gCioIT7qBlh207zV35ArKioP9Y83cWx9uAs22pfr6g+7RxO h8bJZlJbSG6IGr3voANCIb9UyjU1V/l8iEHqRwhmr/A5rARPfD7g8lfMEQeGkzX6 +/UydX2fumB3tl8e2iMQj6vLVdSOsCkehvpHK+Z33EpkKhan7GwZ2sZ05WmXV/nY QLAYfD10KegoNWl5Ay4GTp4hEAIYVrRJCLC+wnLdc0U8udbfCuTC31LK4w== =NcRh -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This covers all architectures (except MIPS) so I don't expect any other feature pull requests this merge window. ARM: - Add MTE support in guests, complete with tag save/restore interface - Reduce the impact of CMOs by moving them in the page-table code - Allow device block mappings at stage-2 - Reduce the footprint of the vmemmap in protected mode - Support the vGIC on dumb systems such as the Apple M1 - Add selftest infrastructure to support multiple configuration and apply that to PMU/non-PMU setups - Add selftests for the debug architecture - The usual crop of PMU fixes PPC: - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes S390: - new HW facilities for guests - make inline assembly more robust with KASAN and co x86: - Allow userspace to handle emulation errors (unknown instructions) - Lazy allocation of the rmap (host physical -> guest physical address) - Support for virtualizing TSC scaling on VMX machines - Optimizations to avoid shattering huge pages at the beginning of live migration - Support for initializing the PDPTRs without loading them from memory - Many TLB flushing cleanups - Refuse to load if two-stage paging is available but NX is not (this has been a requirement in practice for over a year) - A large series that separates the MMU mode (WP/SMAP/SMEP etc.) from CR0/CR4/EFER, using the MMU mode everywhere once it is computed from the CPU registers - Use PM notifier to notify the guest about host suspend or hibernate - Support for passing arguments to Hyper-V hypercalls using XMM registers - Support for Hyper-V TLB flush hypercalls and enlightened MSR bitmap on AMD processors - Hide Hyper-V hypercalls that are not included in the guest CPUID - Fixes for live migration of virtual machines that use the Hyper-V "enlightened VMCS" optimization of nested virtualization - Bugfixes (not many) Generic: - Support for retrieving statistics without debugfs - Cleanups for the KVM selftests API" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (314 commits) KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled kvm: x86: disable the narrow guest module parameter on unload selftests: kvm: Allows userspace to handle emulation errors. kvm: x86: Allow userspace to handle emulation errors KVM: x86/mmu: Let guest use GBPAGES if supported in hardware and TDP is on KVM: x86/mmu: Get CR4.SMEP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Get CR0.WP from MMU, not vCPU, in shadow page fault KVM: x86/mmu: Drop redundant rsvd bits reset for nested NPT KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic KVM: x86: Enhance comments for MMU roles and nested transition trickiness KVM: x86/mmu: WARN on any reserved SPTE value when making a valid SPTE KVM: x86/mmu: Add helpers to do full reserved SPTE checks w/ generic MMU KVM: x86/mmu: Use MMU's role to determine PTTYPE KVM: x86/mmu: Collapse 32-bit PAE and 64-bit statements for helpers KVM: x86/mmu: Add a helper to calculate root from role_regs KVM: x86/mmu: Add helper to update paging metadata KVM: x86/mmu: Don't update nested guest's paging bitmasks if CR0.PG=0 KVM: x86/mmu: Consolidate reset_rsvds_bits_mask() calls KVM: x86/mmu: Use MMU role_regs to get LA57, and drop vCPU LA57 helper KVM: x86/mmu: Get nested MMU's root level from the MMU's role ...	2021-06-28 15:40:51 -07:00
Jing Zhang	bc9e9e672d	KVM: debugfs: Reuse binary stats descriptors To remove code duplication, use the binary stats descriptors in the implementation of the debugfs interface for statistics. This unifies the definition of statistics for the binary and debugfs interfaces. Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-8-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:29 -04:00
Jing Zhang	ce55c04945	KVM: stats: Support binary stats retrieval for a VCPU Add a VCPU ioctl to get a statistics file descriptor by which a read functionality is provided for userspace to read out VCPU stats header, descriptors and data. Define VCPU statistics descriptors and header for all architectures. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-5-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:19 -04:00
Jing Zhang	fcfe1baedd	KVM: stats: Support binary stats retrieval for a VM Add a VM ioctl to get a statistics file descriptor by which a read functionality is provided for userspace to read out VM stats header, descriptors and data. Define VM statistics descriptors and header for all architectures. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-4-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 18:00:10 -04:00
Jing Zhang	cb082bfab5	KVM: stats: Add fd-based API to read binary stats data This commit defines the API for userspace and prepare the common functionalities to support per VM/VCPU binary stats data readings. The KVM stats now is only accessible by debugfs, which has some shortcomings this change series are supposed to fix: 1. The current debugfs stats solution in KVM could be disabled when kernel Lockdown mode is enabled, which is a potential rick for production. 2. The current debugfs stats solution in KVM is organized as "one stats per file", it is good for debugging, but not efficient for production. 3. The stats read/clear in current debugfs solution in KVM are protected by the global kvm_lock. Besides that, there are some other benefits with this change: 1. All KVM VM/VCPU stats can be read out in a bulk by one copy to userspace. 2. A schema is used to describe KVM statistics. From userspace's perspective, the KVM statistics are self-describing. 3. With the fd-based solution, a separate telemetry would be able to read KVM stats in a less privileged environment. 4. After the initial setup by reading in stats descriptors, a telemetry only needs to read the stats data itself, no more parsing or setup is needed. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: Fuad Tabba <tabba@google.com> Tested-by: Fuad Tabba <tabba@google.com> #arm64 Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-3-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 11:47:57 -04:00
Jing Zhang	0193cc908b	KVM: stats: Separate generic stats from architecture specific ones Generic KVM stats are those collected in architecture independent code or those supported by all architectures; put all generic statistics in a separate structure. This ensures that they are defined the same way in the statistics API which is being added, removing duplication among different architectures in the declaration of the descriptors. No functional change intended. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Ricardo Koller <ricarkol@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Message-Id: <20210618222709.1858088-2-jingzhangos@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-06-24 11:47:56 -04:00
Nicholas Piggin	59dc5bfca0	powerpc/64s: avoid reloading (H)SRR registers if they are still valid When an interrupt is taken, the SRR registers are set to return to where it left off. Unless they are modified in the meantime, or the return address or MSR are modified, there is no need to reload these registers when returning from interrupt. Introduce per-CPU flags that track the validity of SRR and HSRR registers. These are cleared when returning from interrupt, when using the registers for something else (e.g., OPAL calls), when adjusting the return address or MSR of a context, and when context switching (which changes the return address and MSR). This improves the performance of interrupt returns. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Fold in fixup patch from Nick] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210617155116.2167984-5-npiggin@gmail.com	2021-06-25 00:06:55 +10:00
Paolo Bonzini	c3ab0e28a4	Merge branch 'topic/ppc-kvm' of https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux into HEAD - Support for the H_RPT_INVALIDATE hypercall - Conversion of Book3S entry/exit to C - Bug fixes	2021-06-23 07:30:41 -04:00
Michael Ellerman	a736143afd	Merge branch 'topic/ppc-kvm' into next Pull in some more ppc KVM patches we are keeping in our topic branch. In particular this brings in the series to add H_RPT_INVALIDATE.	2021-06-23 00:19:08 +10:00
Nathan Chancellor	51696f39cb	KVM: PPC: Book3S HV: Workaround high stack usage with clang LLVM does not emit optimal byteswap assembly, which results in high stack usage in kvmhv_enter_nested_guest() due to the inlining of byteswap_pt_regs(). With LLVM 12.0.0: arch/powerpc/kvm/book3s_hv_nested.c:289:6: error: stack frame size of 2512 bytes in function 'kvmhv_enter_nested_guest' [-Werror,-Wframe-larger-than=] long kvmhv_enter_nested_guest(struct kvm_vcpu vcpu) ^ 1 error generated. While this gets fixed in LLVM, mark byteswap_pt_regs() as noinline_for_stack so that it does not get inlined and break the build due to -Werror by default in arch/powerpc/. Not inlining saves approximately 800 bytes with LLVM 12.0.0: arch/powerpc/kvm/book3s_hv_nested.c:290:6: warning: stack frame size of 1728 bytes in function 'kvmhv_enter_nested_guest' [-Wframe-larger-than=] long kvmhv_enter_nested_guest(struct kvm_vcpu vcpu) ^ 1 warning generated. Cc: stable@vger.kernel.org # v4.20+ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/1292 Link: https://bugs.llvm.org/show_bug.cgi?id=49610 Link: https://lore.kernel.org/r/202104031853.vDT0Qjqj-lkp@intel.com/ Link: https://gist.github.com/ba710e3703bf45043a31e2806c843ffd Link: https://lore.kernel.org/r/20210621182440.990242-1-nathan@kernel.org	2021-06-23 00:18:30 +10:00
Bharata B Rao	81468083f3	KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall H_RPT_INVALIDATE if available. The availability of this hcall is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions DT property. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-7-bharata@linux.ibm.com	2021-06-22 23:38:28 +10:00
Bharata B Rao	b87cc116c7	KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability Now that we have H_RPT_INVALIDATE fully implemented, enable support for the same via KVM_CAP_PPC_RPT_INVALIDATE KVM capability Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-6-bharata@linux.ibm.com	2021-06-22 23:38:28 +10:00
Bharata B Rao	53324b51c5	KVM: PPC: Book3S HV: Nested support in H_RPT_INVALIDATE Enable support for process-scoped invalidations from nested guests and partition-scoped invalidations for nested guests. Process-scoped invalidations for any level of nested guests are handled by implementing H_RPT_INVALIDATE handler in the nested guest exit path in L0. Partition-scoped invalidation requests are forwarded to the right nested guest, handled there and passed down to L0 for eventual handling. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> [aneesh: Nested guest partition-scoped invalidation changes] Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> [mpe: Squash in fixup patch] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-5-bharata@linux.ibm.com	2021-06-22 23:35:37 +10:00
Bharata B Rao	f0c6fbbb90	KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE H_RPT_INVALIDATE does two types of TLB invalidations: 1. Process-scoped invalidations for guests when LPCR[GTSE]=0. This is currently not used in KVM as GTSE is not usually disabled in KVM. 2. Partition-scoped invalidations that an L1 hypervisor does on behalf of an L2 guest. This is currently handled by H_TLB_INVALIDATE hcall and this new replaces the old that. This commit enables process-scoped invalidations for L1 guests. Support for process-scoped and partition-scoped invalidations from/for nested guests will be added separately. Process scoped tlbie invalidations from L1 and nested guests need RS register for TLBIE instruction to contain both PID and LPID. This patch introduces primitives that execute tlbie instruction with both PID and LPID set in prepartion for H_RPT_INVALIDATE hcall. A description of H_RPT_INVALIDATE follows: int64 /* H_Success: Return code on successful completion / / H_Busy - repeat the call with the same / / H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters / hcall(const uint64 H_RPT_INVALIDATE, / Invalidate RPT translation lookaside information / uint64 id, / PID/LPID to invalidate / uint64 target, / Invalidation target / uint64 type, / Type of lookaside information / uint64 pg_sizes, / Page sizes / uint64 start, / Start of Effective Address (EA) range (inclusive) / uint64 end) / End of EA range (exclusive) / Invalidation targets (target) ----------------------------- Core MMU 0x01 / All virtual processors in the partition / Core local MMU 0x02 / Current virtual processor / Nest MMU 0x04 / All nest/accelerator agents in use by the partition / A combination of the above can be specified, except core and core local. Type of translation to invalidate (type) --------------------------------------- NESTED 0x0001 / invalidate nested guest partition-scope / TLB 0x0002 / Invalidate TLB / PWC 0x0004 / Invalidate Page Walk Cache / PRT 0x0008 / Invalidate caching of Process Table Entries if NESTED is clear / PAT 0x0008 / Invalidate caching of Partition Table Entries if NESTED is set / A combination of the above can be specified. Page size mask (pages) ---------------------- 4K 0x01 64K 0x02 2M 0x04 1G 0x08 All sizes (-1UL) A combination of the above can be specified. All page sizes can be selected with -1. Semantics: Invalidate radix tree lookaside information matching the parameters given. Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are different from the defined values. * Return H_PARAMETER if NESTED is set and pid is not a valid nested LPID allocated to this partition * Return H_P5 if (start, end) doesn't form a valid range. Start and end should be a valid Quadrant address and end > start. * Return H_NotSupported if the partition is not in running in radix translation mode. * May invalidate more translation information than requested. * If start = 0 and end = -1, set the range to cover all valid addresses. Else start and end should be aligned to 4kB (lower 11 bits clear). * If NESTED is clear, then invalidate process scoped lookaside information. Else pid specifies a nested LPID, and the invalidation is performed on nested guest partition table and nested guest partition scope real addresses. * If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and quadrant 0 spaces, Else valid addresses are quadrant 0. * Pages which are fully covered by the range are to be invalidated. Those which are partially covered are considered outside invalidation range, which allows a caller to optimally invalidate ranges that may contain mixed page sizes. * Return H_SUCCESS on success. Signed-off-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210621085003.904767-4-bharata@linux.ibm.com	2021-06-21 22:54:27 +10:00
Suraj Jitindar Singh	77bbbc0cf8	KVM: PPC: Book3S HV: Fix TLB management on SMT8 POWER9 and POWER10 processors The POWER9 vCPU TLB management code assumes all threads in a core share a TLB, and that TLBIEL execued by one thread will invalidate TLBs for all threads. This is not the case for SMT8 capable POWER9 and POWER10 (big core) processors, where the TLB is split between groups of threads. This results in TLB multi-hits, random data corruption, etc. Fix this by introducing cpu_first_tlb_thread_sibling etc., to determine which siblings share TLBs, and use that in the guest TLB flushing code. [npiggin@gmail.com: add changelog and comment] Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210602040441.3984352-1-npiggin@gmail.com	2021-06-21 09:22:34 +10:00
Michael Ellerman	3c53642324	Merge branch 'topic/ppc-kvm' into next Merge some powerpc KVM patches from our topic branch. In particular this brings in Nick's big series rewriting parts of the guest entry/exit path in C. Conflicts: arch/powerpc/kernel/security.c arch/powerpc/kvm/book3s_hv_rmhandlers.S	2021-06-17 16:51:38 +10:00
Christophe Leroy	7235bb3593	powerpc/32s: move CTX_TO_VSID() into mmu-hash.h In order to reuse it in switch_mmu_context(), this patch moves CTX_TO_VSID() macro into asm/book3s/32/mmu-hash.h Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/26b36ef2939234a04b37baf6ffe50cba81f5d1b7.1622708530.git.christophe.leroy@csgroup.eu	2021-06-17 00:09:08 +10:00
Nicholas Piggin	fae5c9f366	KVM: PPC: Book3S HV: remove ISA v3.0 and v3.1 support from P7/8 path POWER9 and later processors always go via the P9 guest entry path now. Remove the remaining support from the P7/8 path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-33-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	0bf7e1b2e9	KVM: PPC: Book3S HV P9: implement hash host / hash guest support Implement support for hash guests under hash host. This has to save and restore the host SLB, and ensure that the MMU is off while switching into the guest SLB. POWER9 and later CPUs now always go via the P9 path. The "fast" guest mode is now renamed to the P9 mode, which is consistent with its functionality and the rest of the naming. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-32-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	079a09a500	KVM: PPC: Book3S HV P9: implement hash guest support Implement hash guest support. Guest entry/exit has to restore and save/clear the SLB, plus several other bits to accommodate hash guests in the P9 path. Radix host, hash guest support is removed from the P7/8 path. The HPT hcalls and faults are not handled in real mode, which is a performance regression. A worst-case fork/exit microbenchmark takes 3x longer after this patch. kbuild benchmark performance is in the noise, but the slowdown is likely to be noticed somewhere. For now, accept this penalty for the benefit of simplifying the P7/8 paths and unifying P9 hash with the new code, because hash is a less important configuration than radix on processors that support it. Hash will benefit from future optimisations to this path, including possibly a faster path to handle such hcalls and interrupts without doing a full exit. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-31-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	ac3c8b41c2	KVM: PPC: Book3S HV P9: Reflect userspace hcalls to hash guests to support PR KVM The reflection of sc 1 interrupts from guest PR=1 to the guest kernel is required to support a hash guest running PR KVM where its guest is making hcalls with sc 1. In preparation for hash guest support, add this hcall reflection to the P9 path. The P7/8 path does this in its realmode hcall handler (sc_1_fast_return). Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-30-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	6165d5dd99	KVM: PPC: Book3S HV: add virtual mode handlers for HPT hcalls and page faults In order to support hash guests in the P9 path (which does not do real mode hcalls or page fault handling), these real-mode hash specific interrupts need to be implemented in virt mode. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-29-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	a9aa86e08b	KVM: PPC: Book3S HV: small pseries_do_hcall cleanup Functionality should not be changed. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-28-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	cbcff8b1c5	KVM: PPC: Book3S HV P9: Allow all P9 processors to enable nested HV All radix guests go via the P9 path now, so there is no need to limit nested HV to processors that support "mixed mode" MMU. Remove the restriction. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-27-npiggin@gmail.com	2021-06-10 22:12:15 +10:00
Nicholas Piggin	2ce008c8b2	KVM: PPC: Book3S HV: Remove unused nested HV tests in XICS emulation Commit `f3c18e9342` ("KVM: PPC: Book3S HV: Use XICS hypercalls when running as a nested hypervisor") added nested HV tests in XICS hypercalls, but not all are required. * icp_eoi is only called by kvmppc_deliver_irq_passthru which is only called by kvmppc_check_passthru which is only caled by kvmppc_read_one_intr. * kvmppc_read_one_intr is only called by kvmppc_read_intr which is only called by the L0 HV rmhandlers code. * kvmhv_rm_send_ipi is called by: - kvmhv_interrupt_vcore which is only called by kvmhv_commence_exit which is only called by the L0 HV rmhandlers code. - icp_send_hcore_msg which is only called by icp_rm_set_vcpu_irq. - icp_rm_set_vcpu_irq which is only called by icp_rm_try_update - icp_rm_set_vcpu_irq is not nested HV safe because it writes to LPCR directly without a kvmhv_on_pseries test. Nested handlers should not in general be using the rm handlers. The important test seems to be in kvmppc_ipi_thread, which sends the virt-mode H_IPI handler kick to use smp_call_function rather than msgsnd. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-26-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	dcbac73a5b	KVM: PPC: Book3S HV: Remove virt mode checks from real mode handlers Now that the P7/8 path no longer supports radix, real-mode handlers do not need to deal with being called in virt mode. This change effectively reverts commit `acde25726b` ("KVM: PPC: Book3S HV: Add radix checks in real-mode hypercall handlers"). It removes a few more real-mode tests in rm hcall handlers, which allows the indirect ops for the xive module to be removed from the built-in xics rm handlers. kvmppc_h_random is renamed to kvmppc_rm_h_random to be a bit more descriptive and consistent with other rm handlers. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-25-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	9769a7fd79	KVM: PPC: Book3S HV: Remove radix guest support from P7/8 path The P9 path now runs all supported radix guest combinations, so remove radix guest support from the P7/8 path. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-24-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	aaae8c7900	KVM: PPC: Book3S HV: Remove support for dependent threads mode on P9 Dependent-threads mode is the normal KVM mode for pre-POWER9 SMT processors, where all threads in a core (or subcore) would run the same partition at the same time, or they would run the host. This design was mandated by MMU state that is shared between threads in a processor, so the synchronisation point is in hypervisor real-mode that has essentially no shared state, so it's safe for multiple threads to gather and switch to the correct mode. It is implemented by having the host unplug all secondary threads and always run in SMT1 mode, and host QEMU threads essentially represent virtual cores that wake these secondary threads out of unplug when the ioctl is called to run the guest. This happens via a side-path that is mostly invisible to the rest of the Linux host and the secondary threads still appear to be unplugged. POWER9 / ISA v3.0 has a more flexible MMU design that is independent per-thread and allows a much simpler KVM implementation. Before the new "P9 fast path" was added that began to take advantage of this, POWER9 support was implemented in the existing path which has support to run in the dependent threads mode. So it was not much work to add support to run POWER9 in this dependent threads mode. The mode is not required by the POWER9 MMU (although "mixed-mode" hash / radix MMU limitations of early processors were worked around using this mode). But it is one way to run SMT guests without running different guests or guest and host on different threads of the same core, so it could avoid or reduce some SMT attack surfaces without turning off SMT entirely. This security feature has some real, if indeterminate, value. However the old path is lagging in features (nested HV), and with this series the new P9 path adds remaining missing features (radix prefetch bug and hash support, in later patches), so POWER9 dependent threads mode support would be the only remaining reason to keep that code in and keep supporting POWER9/POWER10 in the old path. So here we make the call to drop this feature. Remove dependent threads mode support for POWER9 and above processors. Systems can still achieve this security by disabling SMT entirely, but that would generally come at a larger performance cost for guests. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-23-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	2e1ae9cd56	KVM: PPC: Book3S HV: Implement radix prefetch workaround by disabling MMU Rather than partition the guest PID space + flush a rogue guest PID to work around this problem, instead fix it by always disabling the MMU when switching in or out of guest MMU context in HV mode. This may be a bit less efficient, but it is a lot less complicated and allows the P9 path to trivally implement the workaround too. Newer CPUs are not subject to this issue. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-22-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	41f7799176	KVM: PPC: Book3S HV P9: Switch to guest MMU context as late as possible Move MMU context switch as late as reasonably possible to minimise code running with guest context switched in. This becomes more important when this code may run in real-mode, with later changes. Move WARN_ON as early as possible so program check interrupts are less likely to tangle everything up. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-21-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	edba6aff4f	KVM: PPC: Book3S HV P9: Add helpers for OS SPR handling This is a first step to wrapping supervisor and user SPR saving and loading up into helpers, which will then be called independently in bare metal and nested HV cases in order to optimise SPR access. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-20-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	68e3baaca8	KVM: PPC: Book3S HV P9: Move SPR loading after expiry time check This is wasted work if the time limit is exceeded. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-19-npiggin@gmail.com	2021-06-10 22:12:14 +10:00
Nicholas Piggin	a32ed1bb70	KVM: PPC: Book3S HV P9: Improve exit timing accounting coverage The C conversion caused exit timing to become a bit cramped. Expand it to cover more of the entry and exit code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-18-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	6d770e3fe9	KVM: PPC: Book3S HV P9: Read machine check registers while MSR[RI] is 0 SRR0/1, DAR, DSISR must all be protected from machine check which can clobber them. Ensure MSR[RI] is clear while they are live. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-17-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	c00366e237	KVM: PPC: Book3S HV P9: inline kvmhv_load_hv_regs_and_go into __kvmhv_vcpu_entry_p9 Now the initial C implementation is done, inline more HV code to make rearranging things easier. And rename __kvmhv_vcpu_entry_p9 to drop the leading underscores as it's now C, and is now a more complete vcpu entry. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-16-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	89d35b2391	KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C Almost all logic is moved to C, by introducing a new in_guest mode for the P9 path that branches very early in the KVM interrupt handler to P9 exit code. The main P9 entry and exit assembly is now only about 160 lines of low level stack setup and register save/restore, plus a bad-interrupt handler. There are two motivations for this, the first is just make the code more maintainable being in C. The second is to reduce the amount of code running in a special KVM mode, "realmode". In quotes because with radix it is no longer necessarily real-mode in the MMU, but it still has to be treated specially because it may be in real-mode, and has various important registers like PID, DEC, TB, etc set to guest. This is hostile to the rest of Linux and can't use arbitrary kernel functionality or be instrumented well. This initial patch is a reasonably faithful conversion of the asm code, but it does lack any loop to return quickly back into the guest without switching out of realmode in the case of unimportant or easily handled interrupts. As explained in previous changes, handling HV interrupts very quickly in this low level realmode is not so important for P9 performance, and are important to avoid for security, observability, debugability reasons. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-15-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	9dc2babc18	KVM: PPC: Book3S HV P9: Stop handling hcalls in real-mode in the P9 path In the interest of minimising the amount of code that is run in "real-mode", don't handle hcalls in real mode in the P9 path. This requires some new handlers for H_CEDE and xics-on-xive to be added before xive is pulled or cede logic is checked. This introduces a change in radix guest behaviour where radix guests that execute 'sc 1' in userspace now get a privilege fault whereas previously the 'sc 1' would be reflected as a syscall interrupt to the guest kernel. That reflection is only required for hash guests that run PR KVM. Background: In POWER8 and earlier processors, it is very expensive to exit from the HV real mode context of a guest hypervisor interrupt, and switch to host virtual mode. On those processors, guest->HV interrupts reach the hypervisor with the MMU off because the MMU is loaded with guest context (LPCR, SDR1, SLB), and the other threads in the sub-core need to be pulled out of the guest too. Then the primary must save off guest state, invalidate SLB and ERAT, and load up host state before the MMU can be enabled to run in host virtual mode (~= regular Linux mode). Hash guests also require a lot of hcalls to run due to the nature of the MMU architecture and paravirtualisation design. The XICS interrupt controller requires hcalls to run. So KVM traditionally tries hard to avoid the full exit, by handling hcalls and other interrupts in real mode as much as possible. By contrast, POWER9 has independent MMU context per-thread, and in radix mode the hypervisor is in host virtual memory mode when the HV interrupt is taken. Radix guests do not require significant hcalls to manage their translations, and xive guests don't need hcalls to handle interrupts. So it's much less important for performance to handle hcalls in real mode on POWER9. One caveat is that the TCE hcalls are performance critical, real-mode variants introduced for POWER8 in order to achieve 10GbE performance. Real mode TCE hcalls were found to be less important on POWER9, which was able to drive 40GBe networking without them (using the virt mode hcalls) but performance is still important. These hcalls will benefit from subsequent guest entry/exit optimisation including possibly a faster "partial exit" that does not entirely switch to host context to handle the hcall. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-14-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	48013cbc50	KVM: PPC: Book3S HV P9: Move radix MMU switching instructions together Switching the MMU from radix<->radix mode is tricky particularly as the MMU can remain enabled and requires a certain sequence of SPR updates. Move these together into their own functions. This also includes the radix TLB check / flush because it's tied in to MMU switching due to tlbiel getting LPID from LPIDR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-13-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	09512c2916	KVM: PPC: Book3S HV P9: Move xive vcpu context management into kvmhv_p9_guest_entry Move the xive management up so the low level register switching can be pushed further down in a later patch. XIVE MMIO CI operations can run in higher level code with machine checks, tracing, etc., available. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-12-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	6ffe2c6e6d	KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races irq_work's use of the DEC SPR is racy with guest<->host switch and guest entry which flips the DEC interrupt to guest, which could lose a host work interrupt. This patch closes one race, and attempts to comment another class of races. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-11-npiggin@gmail.com	2021-06-10 22:12:13 +10:00
Nicholas Piggin	413679e73b	KVM: PPC: Book3S HV P9: Move setting HDEC after switching to guest LPCR LPCR[HDICE]=0 suppresses hypervisor decrementer exceptions on some processors, so it must be enabled before HDEC is set. Rather than set it in the host LPCR then setting HDEC, move the HDEC update to after the guest MMU context (including LPCR) is loaded. There shouldn't be much concern with delaying HDEC by some 10s or 100s of nanoseconds by setting it a bit later. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-10-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	023c3c96ca	KVM: PPC: Book3S HV P9: implement kvmppc_xive_pull_vcpu in C This is more symmetric with kvmppc_xive_push_vcpu, and has the advantage that it runs with the MMU on. The extra test added to the asm will go away with a future change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-9-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	e2762743c6	KVM: PPC: Book3S 64: Minimise hcall handler calling convention differences This sets up the same calling convention from interrupt entry to KVM interrupt handler for system calls as exists for other interrupt types. This is a better API, it uses a save area rather than SPR, and it has more registers free to use. Using a single common API helps maintain it, and it becomes easier to use in C in a later patch. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-8-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	1b5821c630	KVM: PPC: Book3S 64: move bad_host_intr check to HV handler The bad_host_intr check will never be true with PR KVM, move it to HV code. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-7-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	69fdd67499	KVM: PPC: Book3S 64: Move interrupt early register setup to KVM Like the earlier patch for hcalls, KVM interrupt entry requires a different calling convention than the Linux interrupt handlers set up. Move the code that converts from one to the other into KVM. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-6-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	04ece7b60b	KVM: PPC: Book3S 64: Move hcall early register setup to KVM System calls / hcalls have a different calling convention than other interrupts, so there is code in the KVMTEST to massage these into the same form as other interrupt handlers. Move this work into the KVM hcall handler. This means teaching KVM a little more about the low level interrupt handler setup, PACA save areas, etc., although that's not obviously worse than the current approach of coming up with an entirely different interrupt register / save convention. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-5-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	31c67cfe2a	KVM: PPC: Book3S 64: add hcall interrupt handler Add a separate hcall entry point. This can be used to deal with the different calling convention. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-4-npiggin@gmail.com	2021-06-10 22:12:12 +10:00
Nicholas Piggin	f33e0702d9	KVM: PPC: Book3S 64: Move GUEST_MODE_SKIP test into KVM Move the GUEST_MODE_SKIP logic into KVM code. This is quite a KVM internal detail that has no real need to be in common handlers. Add a comment explaining the what and why of KVM "skip" interrupts. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-3-npiggin@gmail.com	2021-06-10 22:12:11 +10:00
Nicholas Piggin	f36011569b	KVM: PPC: Book3S 64: move KVM interrupt entry to a common entry point Rather than bifurcate the call depending on whether or not HV is possible, and have the HV entry test for PR, just make a single common point which does the demultiplexing. This makes it simpler to add another type of exit handler. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210528090752.3542186-2-npiggin@gmail.com	2021-06-10 22:12:01 +10:00
Marc Zyngier	13a9a5d17d	powerpc: Add missing linux/{of.h,irqdomain.h} include directives A bunch of PPC files are missing the inclusion of linux/of.h and linux/irqdomain.h, relying on transitive inclusion from another file. As we are about to break this dependency, make sure these dependencies are explicit. Signed-off-by: Marc Zyngier <maz@kernel.org>	2021-06-10 13:09:16 +01:00
Linus Torvalds	bd7b12aa60	powerpc fixes for 5.13 #5 Fix our KVM reverse map real-mode handling since we enabled huge vmalloc (in some configurations). Revert a recent change to our IOMMU code which broke some devices. Fix KVM handling of FSCR on P7/P8, which could have possibly let a guest crash it's Qemu. Fix kprobes validation of prefixed instructions across page boundary. Thanks to: Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas, Frederic Barrat, Naveen N. Rao, Nicholas Piggin. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmC8wi8THG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgN42D/4vHCHX4T0CZ/5bwh1RMOoGKM+PFyLe BoA2i8lvUILG1+LOiRJuBnVZiWwKYBqfkkfY4BmQpU3Oe3gjbJJwc9QGGHUDarWn NmMPqVgaO5qXObObKXzBU1Ihq4UQwMhK044srzXcgMYyTnSFNgWQAsvO0+0Cl4K4 uT100AFV4tps8dLCHCq2XVHuQALnHzZah4yQ8i6u1TMN/TK+kXyONrMSCgsQ1mrM dDsT1zVeegj8EuW/n9kXkLNp2YZeatptZB7cPDtojlhCQTsZBcKnYtDq5ScASuwy 7hGjzA2SyWsa6l0Iejoj8tr/ZS8Nutftz3izuhDNLEf4foz0tOWqxbXJayOA5J7w vzs9OSFbT6z/svELSIkRCvfePqUdDdC2MthWoShgv0SoIXj+Y7ABKQRW9B5rLeF5 RiB2kCB+7S/03qjDtn57IlJC6aVoHzglTAdYXuj7guUEsZQrmtsdm1IM4eB0XYyx A9/AMCGSbswT0/IUriO4b9FtWGOJJf1vWv3WeqE63gPxqhyTz1ACqMT/0HLrARJZ /QLZrbuOSMBSGDnmJxy3vzb+3fxGxSGrUcoYc6MiSODuRgf7zHuRJsSDwoftnOTW PXVWPVz9ef0OEmuBJyEgTrO+/g9jjCPw8UJz9EaFzkMHbaoHRuZdo2m8X6zrXQLh AUVlDkkSmblY9w== =KkfQ -----END PGP SIGNATURE----- Merge tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fix our KVM reverse map real-mode handling since we enabled huge vmalloc (in some configurations). Revert a recent change to our IOMMU code which broke some devices. Fix KVM handling of FSCR on P7/P8, which could have possibly let a guest crash it's Qemu. Fix kprobes validation of prefixed instructions across page boundary. Thanks to Alexey Kardashevskiy, Christophe Leroy, Fabiano Rosas, Frederic Barrat, Naveen N. Rao, and Nicholas Piggin" * tag 'powerpc-5.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: Revert "powerpc/kernel/iommu: Align size for IOMMU_PAGE_SIZE() to save TCEs" KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path powerpc: Fix reverse map real-mode address lookup with huge vmalloc powerpc/kprobes: Fix validation of prefixed instructions across page boundary	2021-06-06 12:39:36 -07:00
Nicholas Piggin	6ba53317d4	KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path Similar to commit `25edcc50d7` ("KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path"), ensure the P7/8 path saves and restores the host FSCR. The logic explained in that patch actually applies there to the old path well: a context switch can be made before kvmppc_vcpu_run_hv restores the host FSCR and returns. Now both the p9 and the p7/8 paths now save and restore their FSCR, it no longer needs to be restored at the end of kvmppc_vcpu_run_hv Fixes: `b005255e12` ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs") Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com	2021-06-04 22:02:25 +10:00
Linus Torvalds	224478289c	ARM fixes: * Another state update on exit to userspace fix * Prevent the creation of mixed 32/64 VMs * Fix regression with irqbypass not restarting the guest on failed connect * Fix regression with debug register decoding resulting in overlapping access * Commit exception state on exit to usrspace * Fix the MMU notifier return values * Add missing 'static' qualifiers in the new host stage-2 code x86 fixes: * fix guest missed wakeup with assigned devices * fix WARN reported by syzkaller * do not use BIT() in UAPI headers * make the kvm_amd.avic parameter bool PPC fixes: * make halt polling heuristics consistent with other architectures selftests: * various fixes * new performance selftest memslot_perf_test * test UFFD minor faults in demand_paging_test -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCyF0MUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroOHSgf/Q4Hm5e12Bj2xJy6A+iShnrbbT8PW hcIIOA7zGWXfjVYcBV7anbj7CcpzfIz0otcRBABa5mkhj+fb3YmPEb0EzCPi4Hru zxpcpB2w7W7WtUOIKe2EmaT+4Pk6/iLcfr8UMHMqx460akE9OmIg10QNWai3My/3 RIOeakSckBI9e/1TQZbxH66dsLwCT0lLco7i7AWHdFxkzUQyoA34HX5pczOCBsO5 3nXH+/txnRVhqlcyzWLVVGVzFqmpHtBqkIInDOXfUqIoxo/gOhOgF1QdMUEKomxn 5ZFXlL5IXNtr+7yiI67iHX7CWkGZE9oJ04TgPHn6LR6wRnVvc3JInzcB5Q== =ollO -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull KVM fixes from Paolo Bonzini: "ARM fixes: - Another state update on exit to userspace fix - Prevent the creation of mixed 32/64 VMs - Fix regression with irqbypass not restarting the guest on failed connect - Fix regression with debug register decoding resulting in overlapping access - Commit exception state on exit to usrspace - Fix the MMU notifier return values - Add missing 'static' qualifiers in the new host stage-2 code x86 fixes: - fix guest missed wakeup with assigned devices - fix WARN reported by syzkaller - do not use BIT() in UAPI headers - make the kvm_amd.avic parameter bool PPC fixes: - make halt polling heuristics consistent with other architectures selftests: - various fixes - new performance selftest memslot_perf_test - test UFFD minor faults in demand_paging_test" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (44 commits) selftests: kvm: fix overlapping addresses in memslot_perf_test KVM: X86: Kill off ctxt->ud KVM: X86: Fix warning caused by stale emulation context KVM: X86: Use kvm_get_linear_rip() in single-step and #DB/#BP interception KVM: x86/mmu: Fix comment mentioning skip_4k KVM: VMX: update vcpu posted-interrupt descriptor when assigning device KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK KVM: x86: add start_assignment hook to kvm_x86_ops KVM: LAPIC: Narrow the timer latency between wait_lapic_expire and world switch selftests: kvm: do only 1 memslot_perf_test run by default KVM: X86: Use _BITUL() macro in UAPI headers KVM: selftests: add shared hugetlbfs backing source type KVM: selftests: allow using UFFD minor faults for demand paging KVM: selftests: create alias mappings when using shared memory KVM: selftests: add shmem backing source type KVM: selftests: refactor vm_mem_backing_src_type flags KVM: selftests: allow different backing source types KVM: selftests: compute correct demand paging size KVM: selftests: simplify setup_demand_paging error handling KVM: selftests: Print a message if /dev/kvm is missing ...	2021-05-29 06:02:25 -10:00
Nicholas Piggin	1438709e63	KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path Similar to commit `25edcc50d7` ("KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path"), ensure the P7/8 path saves and restores the host FSCR. The logic explained in that patch actually applies there to the old path well: a context switch can be made before kvmppc_vcpu_run_hv restores the host FSCR and returns. Now both the p9 and the p7/8 paths now save and restore their FSCR, it no longer needs to be restored at the end of kvmppc_vcpu_run_hv Fixes: `b005255e12` ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs") Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com	2021-05-28 22:54:27 +10:00
Nicholas Piggin	5362a4b6ee	powerpc: Fix reverse map real-mode address lookup with huge vmalloc real_vmalloc_addr() does not currently work for huge vmalloc, which is what the reverse map can be allocated with for radix host, hash guest. Extract the hugepage aware equivalent from eeh code into a helper, and convert existing sites including this one to use it. Fixes: `8abddd968a` ("powerpc/64s/radix: Enable huge vmalloc mappings") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210526120005.3432222-1-npiggin@gmail.com	2021-05-28 22:54:27 +10:00
Wanpeng Li	6bd5b74368	KVM: PPC: exit halt polling on need_resched() This is inspired by commit `262de4102c` (kvm: exit halt polling on need_resched() as well). Due to PPC implements an arch specific halt polling logic, we have to the need_resched() check there as well. This patch adds a helper function that can be shared between book3s and generic halt-polling loops. Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org> Cc: Ben Segall <bsegall@google.com> Cc: Venkatesh Srinivas <venkateshs@chromium.org> Cc: Jim Mattson <jmattson@google.com> Cc: David Matlack <dmatlack@google.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1621339235-11131-1-git-send-email-wanpengli@tencent.com> [Make the function inline. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-27 07:45:50 -04:00
Michael Ellerman	da3bb206c9	KVM: PPC: Book3S HV: Fix kvm_unmap_gfn_range_hv() for Hash MMU Commit `32b48bf851` ("KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks") fixed kvm_unmap_gfn_range_hv() by adding a for loop over each gfn in the range. But for the Hash MMU it repeatedly calls kvm_unmap_rmapp() with the first gfn of the range, rather than iterating through the range. This exhibits as strange guest behaviour, sometimes crashing in firmare, or booting and then guest userspace crashing unexpectedly. Fix it by passing the iterator, gfn, to kvm_unmap_rmapp(). Fixes: `32b48bf851` ("KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks") Reviewed-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210511105459.800788-1-mpe@ellerman.id.au	2021-05-12 11:07:39 +10:00
Nicholas Piggin	34114136f7	KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks Commit `b1c5356e87` ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") causes unmap_gfn_range and age_gfn callbacks to only work on the first gfn in the range. It also makes the aging callbacks call into both radix and hash aging functions for radix guests. Fix this. Add warnings for the single-gfn calls that have been converted to range callbacks, in case they ever receieve ranges greater than 1. Fixes: `b1c5356e87` ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") Reported-by: Bharata B Rao <bharata@linux.ibm.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Message-Id: <20210505121509.1470207-1-npiggin@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-05-07 06:06:20 -04:00
Nicholas Piggin	32b48bf851	KVM: PPC: Book3S HV: Fix conversion to gfn-based MMU notifier callbacks Commit `b1c5356e87` ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") causes unmap_gfn_range and age_gfn callbacks to only work on the first gfn in the range. It also makes the aging callbacks call into both radix and hash aging functions for radix guests. Fix this. Add warnings for the single-gfn calls that have been converted to range callbacks, in case they ever receieve ranges greater than 1. Fixes: `b1c5356e87` ("KVM: PPC: Convert to the gfn-based MMU notifier callbacks") Reported-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Bharata B Rao <bharata@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210505121509.1470207-1-npiggin@gmail.com	2021-05-06 00:25:42 +10:00
Linus Torvalds	152d32aa84	ARM: - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - Some selftests improvements -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmCJ13kUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroM1HAgAqzPxEtiTPTFeFJV5cnPPJ3dFoFDK y/juZJUQ1AOtvuWzzwuf175ewkv9vfmtG6rVohpNSkUlJYeoc6tw7n8BTTzCVC1b c/4Dnrjeycr6cskYlzaPyV6MSgjSv5gfyj1LA5UEM16LDyekmaynosVWY5wJhju+ Bnyid8l8Utgz+TLLYogfQJQECCrsU0Wm//n+8TWQgLf1uuiwshU5JJe7b43diJrY +2DX+8p9yWXCTz62sCeDWNahUv8AbXpMeJ8uqZPYcN1P0gSEUGu8xKmLOFf9kR7b M4U1Gyz8QQbjd2lqnwiWIkvRLX6gyGVbq2zH0QbhUe5gg3qGUX7JjrhdDQ== =AXUi -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "This is a large update by KVM standards, including AMD PSP (Platform Security Processor, aka "AMD Secure Technology") and ARM CoreSight (debug and trace) changes. ARM: - CoreSight: Add support for ETE and TRBE - Stage-2 isolation for the host kernel when running in protected mode - Guest SVE support when running in nVHE mode - Force W^X hypervisor mappings in nVHE mode - ITS save/restore for guests using direct injection with GICv4.1 - nVHE panics now produce readable backtraces - Guest support for PTP using the ptp_kvm driver - Performance improvements in the S2 fault handler x86: - AMD PSP driver changes - Optimizations and cleanup of nested SVM code - AMD: Support for virtual SPEC_CTRL - Optimizations of the new MMU code: fast invalidation, zap under read lock, enable/disably dirty page logging under read lock - /dev/kvm API for AMD SEV live migration (guest API coming soon) - support SEV virtual machines sharing the same encryption context - support SGX in virtual machines - add a few more statistics - improved directed yield heuristics - Lots and lots of cleanups Generic: - Rework of MMU notifier interface, simplifying and optimizing the architecture-specific code - a handful of "Get rid of oprofile leftovers" patches - Some selftests improvements" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (379 commits) KVM: selftests: Speed up set_memory_region_test selftests: kvm: Fix the check of return value KVM: x86: Take advantage of kvm_arch_dy_has_pending_interrupt() KVM: SVM: Skip SEV cache flush if no ASIDs have been used KVM: SVM: Remove an unnecessary prototype declaration of sev_flush_asids() KVM: SVM: Drop redundant svm_sev_enabled() helper KVM: SVM: Move SEV VMCB tracking allocation to sev.c KVM: SVM: Explicitly check max SEV ASID during sev_hardware_setup() KVM: SVM: Unconditionally invoke sev_hardware_teardown() KVM: SVM: Enable SEV/SEV-ES functionality by default (when supported) KVM: SVM: Condition sev_enabled and sev_es_enabled on CONFIG_KVM_AMD_SEV=y KVM: SVM: Append "_enabled" to module-scoped SEV/SEV-ES control variables KVM: SEV: Mask CPUID[0x8000001F].eax according to supported features KVM: SVM: Move SEV module params/variables to sev.c KVM: SVM: Disable SEV/SEV-ES if NPT is disabled KVM: SVM: Free sev_asid_bitmap during init if SEV setup fails KVM: SVM: Zero out the VMCB array used to track SEV ASID association x86/sev: Drop redundant and potentially misleading 'sev_enabled' KVM: x86: Move reverse CPUID helpers to separate header file KVM: x86: Rename GPR accessors to make mode-aware variants the defaults ...	2021-05-01 10:14:08 -07:00
Michael Ellerman	ee1bc694fb	powerpc/kvm: Fix build error when PPC_MEM_KEYS/PPC_PSERIES=n lkp reported a randconfig failure: In file included from arch/powerpc/include/asm/book3s/64/pkeys.h:6, from arch/powerpc/kvm/book3s_64_mmu_host.c:15: arch/powerpc/include/asm/book3s/64/hash-pkey.h: In function 'hash__vmflag_to_pte_pkey_bits': >> arch/powerpc/include/asm/book3s/64/hash-pkey.h:10:23: error: 'VM_PKEY_BIT0' undeclared 10 \| return (((vm_flags & VM_PKEY_BIT0) ? H_PTE_PKEY_BIT0 : 0x0UL) \| \| ^~~~~~~~~~~~ We added the include of book3s/64/pkeys.h for pte_to_hpte_pkey_bits(), but that header on its own should only be included when PPC_MEM_KEYS=y. Instead include linux/pkeys.h, which brings in the right definitions when PPC_MEM_KEYS=y and also provides empty stubs when PPC_MEM_KEYS=n. Fixes: `e4e8bc1df6` ("powerpc/kvm: Fix PR KVM with KUAP/MEM_KEYS enabled") Cc: stable@vger.kernel.org # v5.11+ Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210425115831.2818434-1-mpe@ellerman.id.au	2021-04-27 10:48:37 +10:00
Michael Ellerman	e4e8bc1df6	powerpc/kvm: Fix PR KVM with KUAP/MEM_KEYS enabled The changes to add KUAP support with the hash MMU broke booting of KVM PR guests. The symptom is no visible progress of the guest, or possibly just "SLOF" being printed to the qemu console. Host code is still executing, but breaking into xmon might show a stack trace such as: __might_fault+0x84/0xe0 (unreliable) kvm_read_guest+0x1c8/0x2f0 [kvm] kvmppc_ld+0x1b8/0x2d0 [kvm] kvmppc_load_last_inst+0x50/0xa0 [kvm] kvmppc_exit_pr_progint+0x178/0x220 [kvm_pr] kvmppc_handle_exit_pr+0x31c/0xe30 [kvm_pr] after_sprg3_load+0x80/0x90 [kvm_pr] kvmppc_vcpu_run_pr+0x104/0x260 [kvm_pr] kvmppc_vcpu_run+0x34/0x48 [kvm] kvm_arch_vcpu_ioctl_run+0x340/0x450 [kvm] kvm_vcpu_ioctl+0x2ac/0x8c0 [kvm] sys_ioctl+0x320/0x1060 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c Bisect points to commit `b2ff33a10c` ("powerpc/book3s64/hash/kuap: Enable kuap on hash"), but that's just the commit that enabled KUAP with hash and made the bug visible. The root cause seems to be that KVM PR is creating kernel mappings that don't use the correct key, since we switched to using key 3. We have a helper for adding the right key value, however it's designed to take a pteflags variable, which the KVM code doesn't have. But we can make it work by passing 0 for the pteflags, and tell it explicitly that it should use the kernel key. With that changed guests boot successfully. Fixes: `d94b827e89` ("powerpc/book3s64/kuap: Use Key 3 for kernel mapping with hash translation") Cc: stable@vger.kernel.org # v5.11+ Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210419120139.1455937-1-mpe@ellerman.id.au	2021-04-20 14:22:24 +10:00
Sean Christopherson	b1c5356e87	KVM: PPC: Convert to the gfn-based MMU notifier callbacks Move PPC to the gfn-base MMU notifier APIs, and update all 15 bajillion PPC-internal hooks to work with gfns instead of hvas. No meaningful functional change intended, though the exact order of operations is slightly different since the memslot lookups occur before calling into arch code. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210402005658.3024832-6-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:31:07 -04:00
Sean Christopherson	501b918525	KVM: Move arm64's MMU notifier trace events to generic code Move arm64's MMU notifier trace events into common code in preparation for doing the hva->gfn lookup in common code. The alternative would be to trace the gfn instead of hva, but that's not obviously better and could also be done in common code. Tracing the notifiers is also quite handy for debug regardless of architecture. Remove a completely redundant tracepoint from PPC e500. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20210326021957.1424875-10-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-04-17 08:30:56 -04:00
Nicholas Piggin	732f21a305	KVM: PPC: Book3S HV: Ensure MSR[HV] is always clear in guest MSR Rather than clear the HV bit from the MSR at guest entry, make it clear that the hypervisor does not allow the guest to set the bit. The HV clear is kept in guest entry for now, but a future patch will warn if it is set. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-13-npiggin@gmail.com	2021-04-12 13:36:24 +10:00
Nicholas Piggin	946cf44ac6	KVM: PPC: Book3S HV: Ensure MSR[ME] is always set in guest MSR Rather than add the ME bit to the MSR at guest entry, make it clear that the hypervisor does not allow the guest to clear the bit. The ME set is kept in guest entry for now, but a future patch will warn if it's not present. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-12-npiggin@gmail.com	2021-04-12 13:36:24 +10:00
Nicholas Piggin	0fd85cb83f	KVM: PPC: Book3S HV: Fix CONFIG_SPAPR_TCE_IOMMU=n default hcalls This config option causes the warning in init_default_hcalls to fire because the TCE handlers are in the default hcall list but not implemented. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-9-npiggin@gmail.com	2021-04-12 13:36:24 +10:00
Nicholas Piggin	6c12c4376b	KVM: PPC: Book3S HV: remove unused kvmppc_h_protect argument The va argument is not used in the function or set by its asm caller, so remove it to be safe. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-8-npiggin@gmail.com	2021-04-12 13:36:23 +10:00
Nicholas Piggin	4b5f0a0d49	KVM: PPC: Book3S HV: Remove redundant mtspr PSPB This SPR is set to 0 twice when exiting the guest. Suggested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Daniel Axtens <dja@axtens.net> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-7-npiggin@gmail.com	2021-04-12 13:36:23 +10:00
Nicholas Piggin	72c1528721	KVM: PPC: Book3S HV: Prevent radix guests setting LPCR[TC] Prevent radix guests setting LPCR[TC]. This bit only applies to hash partitions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-6-npiggin@gmail.com	2021-04-12 13:36:23 +10:00
Nicholas Piggin	bcc92a0d6d	KVM: PPC: Book3S HV: Disallow LPCR[AIL] to be set to 1 or 2 These are already disallowed by H_SET_MODE from the guest, also disallow these by updating LPCR directly. AIL modes can affect the host interrupt behaviour while the guest LPCR value is set, so filter it here too. Suggested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-5-npiggin@gmail.com	2021-04-12 13:36:23 +10:00
Nicholas Piggin	67145ef496	KVM: PPC: Book3S HV: Add a function to filter guest LPCR bits Guest LPCR depends on hardware type, and future changes will add restrictions based on errata and guest MMU mode. Move this logic to a common function and use it for the cases where the guest wants to update its LPCR (or the LPCR of a nested guest). This also adds a warning in other places that set or update LPCR if we try to set something that would have been disallowed by the filter, as a sanity check. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-4-npiggin@gmail.com	2021-04-12 13:36:23 +10:00
Nicholas Piggin	a19b70abc6	KVM: PPC: Book3S HV: Nested move LPCR sanitising to sanitise_hv_regs This will get a bit more complicated in future patches. Move it into the helper function. This change allows the L1 hypervisor to determine some of the LPCR bits that the L0 is using to run it, which could be a privilege violation (LPCR is HV-privileged), although the same problem exists now for HFSCR for example. Discussion of the HV privilege issue is ongoing and can be resolved with a later change. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-3-npiggin@gmail.com	2021-04-12 13:36:23 +10:00
Nicholas Piggin	5088eb4092	KVM: PPC: Book3S HV P9: Restore host CTRL SPR after guest exit The host CTRL (runlatch) value is not restored after guest exit. The host CTRL should always be 1 except in CPU idle code, so this can result in the host running with runlatch clear, and potentially switching to a different vCPU which then runs with runlatch clear as well. This has little effect on P9 machines, CTRL is only responsible for some PMU counter logic in the host and so other than corner cases of software relying on that, or explicitly reading the runlatch value (Linux does not appear to be affected but it's possible non-Linux guests could be), there should be no execution correctness problem, though it could be used as a covert channel between guests. There may be microcontrollers, firmware or monitoring tools that sample the runlatch value out-of-band, however since the register is writable by guests, these values would (should) not be relied upon for correct operation of the host, so suboptimal performance or incorrect reporting should be the worst problem. Fixes: `95a6432ce9` ("KVM: PPC: Book3S HV: Streamlined guest entry/exit path on P9 for radix guests") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210412014845.1517916-2-npiggin@gmail.com	2021-04-12 13:36:22 +10:00
Linus Torvalds	d94d14008e	x86: - take into account HVA before retrying on MMU notifier race - fixes for nested AMD guests without NPT - allow INVPCID in guest without PCID - disable PML in hardware when not in use - MMU code cleanups -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmA3eMQUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroP6TQf5ARpUyq3oo+13albwg+zNca6hzR8i Vl7dpoR3bSJCN3sTYFnlL9eXw5TxgeUL2nqKqma6ddZDNDEBLT2Bq8rcFkbi4pUf n7av76EEq74HW/jlUhKVug7Q5Dm5DiKC6BOH3RVuKHbr6iZseyF3jXZSX0Ppf0yF gvoy6cGyMW60NVLN5tuGeOjVQ1fxziE0SqB90fXuiWgZ5rzIBfbqJV7EOOZsGO67 /LHSaEpvKutsc2a+Hx76yQNJjAbb2/O+4Bo5/RqfdqS5tRLGBzYggdJjLvAPvd6P pTNtDCnErvBZQfMedEQyHYuBL2Ca59fOp6i/ekOM2I+m7816+kSkdTMt2g== =iMHY -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull more KVM updates from Paolo Bonzini: "x86: - take into account HVA before retrying on MMU notifier race - fixes for nested AMD guests without NPT - allow INVPCID in guest without PCID - disable PML in hardware when not in use - MMU code cleanups: * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (28 commits) KVM: SVM: Fix nested VM-Exit on #GP interception handling KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created KVM: x86/mmu: Consider the hva in mmu_notifier retry KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault KVM: Documentation: rectify rst markup in KVM_GET_SUPPORTED_HV_CPUID KVM: nSVM: prepare guest save area while is_guest_mode is true KVM: x86/mmu: Remove a variety of unnecessary exports KVM: x86: Fold "write-protect large" use case into generic write-protect KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging KVM: x86: Further clarify the logic and comments for toggling log dirty KVM: x86: Move MMU's PML logic to common code KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() KVM: nVMX: Disable PML in hardware when running L2 KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs KVM: x86/mmu: Pass the memslot to the rmap callbacks KVM: x86/mmu: Split out max mapping level calculation to helper KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE and HugeTLB pages KVM: nVMX: no need to undo inject_page_fault change on nested vmexit ...	2021-02-26 10:00:12 -08:00
Linus Torvalds	b12b472496	powerpc updates for 5.12 A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. Conversion of the 32-bit syscall handling into C. A series from Nick to streamline our TLB flushing when using the Radix MMU. Switch to using queued spinlocks by default for 64-bit server CPUs. A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, Zheng Yongjun. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmAzMagTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgAbBD/wMS2g1Q9oAGZPsx2NGd2RoeAauGxUs Yj6cZVmR+oa6sJyFYgEG7dT7tcwJITQxLBD3HpsHSnJ/rLrMloE33+cZNA9c4STz 0mlzm3R7M5pOgcEqZglsgLP0RQeUuHSSF01g0kf1N3r+HYtmbmPjuUIl8CnAjlbT iMD2ZN2p8/r3kDDht0iBO534HUpsqhc00duSZgQhsV/PR7ZWVxoPk7PEJeo4vXlJ 77986F7J5NLUTjMiLv5lTx49FcPbRd7a1jubsBtahJrwXj2GVvuy2i86G7HY+a+B eSxN7zJQgaFeLo0YPo7fZLBI0MAsIQt3nnZhKX0TMglbv/K8Aq64xiJqsVQdJ883 CeEt0HvSJhsSC0C4O595NEINfDhDd+5IeSF9MvsujYXiUKRXtRkm1EPuAzTcZIzW NwkCLRo33NMXa+khMKaiqF/g7INayPUXoWESx75NXFsuNfcORvstkeUuEoi5GwJo TSlmosFqwRjghQ8eTLZuWBzmh3EpPGdtC4gm6D+lbzhzjah5c/1whyuLqra275kK E3Qt0/V0ixKyvlG7MI5yYh3L7+R/hrsflH7xIJJxZp2DW6mwBJzQYmkxDbSS8PzK nWien2XgpIQhSFat3QqreEFSfNkzdN2MClVi2Y1hpAgi+2Zm9rPdPNGcQI+DSOsB kpJkjOjWNJU/PQ== =dB2S -----END PGP SIGNATURE----- Merge tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - A large series adding wrappers for our interrupt handlers, so that irq/nmi/user tracking can be isolated in the wrappers rather than spread in each handler. - Conversion of the 32-bit syscall handling into C. - A series from Nick to streamline our TLB flushing when using the Radix MMU. - Switch to using queued spinlocks by default for 64-bit server CPUs. - A rework of our PCI probing so that it happens later in boot, when more generic infrastructure is available. - Two small fixes to allow 32-bit little-endian processes to run on 64-bit kernels. - Other smaller features, fixes & cleanups. Thanks to: Alexey Kardashevskiy, Ananth N Mavinakayanahalli, Aneesh Kumar K.V, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chengyang Fan, Christophe Leroy, Christopher M. Riedl, Fabiano Rosas, Florian Fainelli, Frederic Barrat, Ganesh Goudar, Hari Bathini, Jiapeng Chong, Joseph J Allen, Kajol Jain, Markus Elfring, Michal Suchanek, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Pingfan Liu, Po-Hsu Lin, Qian Cai, Ram Pai, Randy Dunlap, Sandipan Das, Stephen Rothwell, Tyrel Datwyler, Will Springer, Yury Norov, and Zheng Yongjun. * tag 'powerpc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (188 commits) powerpc/perf: Adds support for programming of Thresholding in P10 powerpc/pci: Remove unimplemented prototypes powerpc/uaccess: Merge raw_copy_to_user_allowed() into raw_copy_to_user() powerpc/uaccess: Merge __put_user_size_allowed() into __put_user_size() powerpc/uaccess: get rid of small constant size cases in raw_copy_{to,from}_user() powerpc/64: Fix stack trace not displaying final frame powerpc/time: Remove get_tbl() powerpc/time: Avoid using get_tbl() spi: mpc52xx: Avoid using get_tbl() powerpc/syscall: Avoid storing 'current' in another pointer powerpc/32: Handle bookE debugging in C in syscall entry/exit powerpc/syscall: Do not check unsupported scv vector on PPC32 powerpc/32: Remove the counter in global_dbcr0 powerpc/32: Remove verification of MSR_PR on syscall in the ASM entry powerpc/syscall: implement system call entry/exit logic in C for PPC32 powerpc/32: Always save non volatile GPRs at syscall entry powerpc/syscall: Change condition to check MSR_RI powerpc/syscall: Save r3 in regs->orig_r3 powerpc/syscall: Use is_compat_task() powerpc/syscall: Make interrupt.c buildable on PPC32 ...	2021-02-22 14:34:00 -08:00
David Stevens	4a42d848db	KVM: x86/mmu: Consider the hva in mmu_notifier retry Track the range being invalidated by mmu_notifier and skip page fault retries if the fault address is not affected by the in-progress invalidation. Handle concurrent invalidations by finding the minimal range which includes all ranges being invalidated. Although the combined range may include unrelated addresses and cannot be shrunk as individual invalidation operations complete, it is unlikely the marginal gains of proper range tracking are worth the additional complexity. The primary benefit of this change is the reduction in the likelihood of extreme latency when handing a page fault due to another thread having been preempted while modifying host virtual addresses. Signed-off-by: David Stevens <stevensd@chromium.org> Message-Id: <20210222024522.1751719-3-stevensd@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2021-02-22 13:16:53 -05:00
Jiapeng Chong	c9df3f809c	powerpc/xive: Assign boolean values to a bool variable Fix the following coccicheck warnings: ./arch/powerpc/kvm/book3s_xive.c:1856:2-17: WARNING: Assignment of 0/1 to bool variable. ./arch/powerpc/kvm/book3s_xive.c:1854:2-17: WARNING: Assignment of 0/1 to bool variable. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1612680192-43116-1-git-send-email-jiapeng.chong@linux.alibaba.com	2021-02-11 23:35:06 +11:00
Nicholas Piggin	72476aaa46	KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests Commit `68ad28a4cd` ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") incorrectly removed the radix host instruction patch to skip re-loading the host SLB entries when exiting from a hash guest. Restore it. Fixes: `68ad28a4cd` ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-11 17:28:15 +11:00
Paul Mackerras	ab950e1acd	KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries Commit `68ad28a4cd` ("KVM: PPC: Book3S HV: Fix radix guest SLB side channel") changed the older guest entry path, with the side effect that vcpu->arch.slb_max no longer gets cleared for a radix guest. This means that a HPT guest which loads some SLB entries, switches to radix mode, runs the guest using the old guest entry path (e.g., because the indep_threads_mode module parameter has been set to false), and then switches back to HPT mode would now see the old SLB entries being present, whereas previously it would have seen no SLB entries. To avoid changing guest-visible behaviour, this adds a store instruction to clear vcpu->arch.slb_max for a radix guest using the old guest entry path. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-11 17:19:42 +11:00
Fabiano Rosas	a722076e94	KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 These machines don't support running both MMU types at the same time, so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is using Radix MMU. [paulus@ozlabs.org - added defensive check on kvmppc_hv_ops->hash_v3_possible] Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 16:19:36 +11:00
Fabiano Rosas	25edcc50d7	KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path The Facility Status and Control Register is a privileged SPR that defines the availability of some features in problem state. Since it can be written by the guest, we must restore it to the previous host value after guest exit. This restoration is currently done by taking the value from current->thread.fscr, which in the P9 path is not enough anymore because the guest could context switch the QEMU thread, causing the guest-current value to be saved into the thread struct. The above situation manifested when running a QEMU linked against a libc with System Call Vectored support, which causes scv instructions to be run by QEMU early during the guest boot (during SLOF), at which point the FSCR is 0 due to guest entry. After a few scv calls (1 to a couple hundred), the context switching happens and the QEMU thread runs with the guest value, resulting in a Facility Unavailable interrupt. This patch saves and restores the host value of FSCR in the inner guest entry loop in a way independent of current->thread.fscr. The old way of doing it is still kept in place because it works for the old entry path. Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Yang Li	63e9f23573	KVM: PPC: remove unneeded semicolon Eliminate the following coccicheck warning: ./arch/powerpc/kvm/booke.c:701:2-3: Unneeded semicolon Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	7a7f94a3a9	KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB IH=6 may preserve hypervisor real-mode ERAT entries and is the recommended SLBIA hint for switching partitions. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	078ebe35fc	KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	68ad28a4cd	KVM: PPC: Book3S HV: Fix radix guest SLB side channel The slbmte instruction is legal in radix mode, including radix guest mode. This means radix guests can load the SLB with arbitrary data. KVM host does not clear the SLB when exiting a guest if it was a radix guest, which would allow a rogue radix guest to use the SLB as a side channel to communicate with other guests. Fix this by ensuring the SLB is cleared when coming out of a radix guest. Only the first 4 entries are a concern, because radix guests always run with LPCR[UPRT]=1, which limits the reach of slbmte. slbia is not used (except in a non-performance-critical path) because it can clear cached translations. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Nicholas Piggin	b1b1697ae0	KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support This reverts much of commit `c01015091a` ("KVM: PPC: Book3S HV: Run HPT guests on POWER9 radix hosts"), which was required to run HPT guests on RPT hosts on early POWER9 CPUs without support for "mixed mode", which meant the host could not run with MMU on while guests were running. This code has some corner case bugs, e.g., when the guest hits a machine check or HMI the primary locks up waiting for secondaries to switch LPCR to host, which they never do. This could all be fixed in software, but most CPUs in production have mixed mode support, and those that don't are believed to be all in installations that don't use this capability. So simplify things and remove support. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Ravi Bangoria	d9a47edabc	KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR Introduce KVM_CAP_PPC_DAWR1 which can be used by QEMU to query whether KVM supports 2nd DAWR or not. The capability is by default disabled even when the underlying CPU supports 2nd DAWR. QEMU needs to check and enable it manually to use the feature. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Ravi Bangoria	bd1de1a0e6	KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR KVM code assumes single DAWR everywhere. Add code to support 2nd DAWR. DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/ unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR. Also, KVM will support 2nd DAWR only if CPU_FTR_DAWR1 is set. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Ravi Bangoria	122954ed7d	KVM: PPC: Book3S HV: Rename current DAWR macros and variables Power10 is introducing a second DAWR (Data Address Watchpoint Register). Use real register names (with suffix 0) from ISA for current macros and variables used by kvm. One exception is KVM_REG_PPC_DAWR. Keep it as it is because it's uapi so changing it will break userspace. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Ravi Bangoria	afe7504930	KVM: PPC: Book3S HV: Allow nested guest creation when L0 hv_guest_state > L1 On powerpc, L1 hypervisor takes help of L0 using H_ENTER_NESTED hcall to load L2 guest state in cpu. L1 hypervisor prepares the L2 state in struct hv_guest_state and passes a pointer to it via hcall. Using that pointer, L0 reads/writes that state directly from/to L1 memory. Thus L0 must be aware of hv_guest_state layout of L1. Currently it uses version field to achieve this. i.e. If L0 hv_guest_state.version != L1 hv_guest_state.version, L0 won't allow nested kvm guest. This restriction can be loosened up a bit. L0 can be taught to understand older layout of hv_guest_state, if we restrict the new members to be added only at the end, i.e. we can allow nested guest even when L0 hv_guest_state.version > L1 hv_guest_state.version. Though, the other way around is not possible. Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>	2021-02-10 14:31:08 +11:00
Christophe Leroy	fd659e8f2c	powerpc/32s: Change mfsrin() into a static inline function mfsrin() is a macro. Change in into an inline function to avoid conflicts in KVM and make it more evolutive. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/72c7b9879e2e2e6f5c27dadda6486386c2b50f23.1612612022.git.christophe.leroy@csgroup.eu	2021-02-09 01:10:15 +11:00
Nicholas Piggin	3a96570ffc	powerpc: convert interrupt handlers to use wrappers Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-29-npiggin@gmail.com	2021-02-09 00:02:12 +11:00
Nicholas Piggin	112665286d	KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs Interrupts that occur in kernel mode expect that context tracking is set to kernel. Enabling local irqs before context tracking switches from guest to host means interrupts can come in and trigger warnings about wrong context, and possibly worse. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210130130852.2952424-3-npiggin@gmail.com	2021-02-09 00:02:08 +11:00
Christophe Leroy	27f699579b	powerpc/kvm: Force selection of CONFIG_PPC_FPU book3s/32 kvm is designed with the assumption that an FPU is always present. Force selection of FPU support in the kernel when build KVM. Fixes: `7d68c89169` ("powerpc/32s: Allow deselecting CONFIG_PPC_FPU on mpc832x") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/74461a99fa1466f361532ca794ca0753be3d9f86.1611038044.git.christophe.leroy@csgroup.eu	2021-01-30 11:39:32 +11:00
Cédric Le Goater	d834915e8e	KVM: PPC: Book3S HV: Include prototypes It fixes these W=1 compile errors : CC [M] arch/powerpc/kvm/book3s_64_mmu_hv.o ../arch/powerpc/kvm/book3s_64_mmu_hv.c:879:5: error: no previous prototype for ‘kvm_unmap_hva_range_hv’ [-Werror=missing-prototypes] 879 \| int kvm_unmap_hva_range_hv(struct kvm kvm, unsigned long start, unsigned long end) \| ^~~~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:888:6: error: no previous prototype for ‘kvmppc_core_flush_memslot_hv’ [-Werror=missing-prototypes] 888 \| void kvmppc_core_flush_memslot_hv(struct kvm kvm, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:970:5: error: no previous prototype for ‘kvm_age_hva_hv’ [-Werror=missing-prototypes] 970 \| int kvm_age_hva_hv(struct kvm kvm, unsigned long start, unsigned long end) \| ^~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:1011:5: error: no previous prototype for ‘kvm_test_age_hva_hv’ [-Werror=missing-prototypes] 1011 \| int kvm_test_age_hva_hv(struct kvm kvm, unsigned long hva) \| ^~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/book3s_64_mmu_hv.c:1019:6: error: no previous prototype for ‘kvm_set_spte_hva_hv’ [-Werror=missing-prototypes] 1019 \| void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte) \| ^~~~~~~~~~~~~~~~~~~ Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-20-clg@kaod.org	2021-01-30 11:39:31 +11:00
Cédric Le Goater	9236f57a9e	KVM: PPC: Make the VMX instruction emulation routines static These are only used locally. It fixes these W=1 compile errors : ../arch/powerpc/kvm/powerpc.c:1521:5: error: no previous prototype for ‘kvmppc_get_vmx_dword’ [-Werror=missing-prototypes] 1521 \| int kvmppc_get_vmx_dword(struct kvm_vcpu vcpu, int index, u64 val) \| ^~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/powerpc.c:1539:5: error: no previous prototype for ‘kvmppc_get_vmx_word’ [-Werror=missing-prototypes] 1539 \| int kvmppc_get_vmx_word(struct kvm_vcpu vcpu, int index, u64 val) \| ^~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/powerpc.c:1557:5: error: no previous prototype for ‘kvmppc_get_vmx_hword’ [-Werror=missing-prototypes] 1557 \| int kvmppc_get_vmx_hword(struct kvm_vcpu vcpu, int index, u64 val) \| ^~~~~~~~~~~~~~~~~~~~ ../arch/powerpc/kvm/powerpc.c:1575:5: error: no previous prototype for ‘kvmppc_get_vmx_byte’ [-Werror=missing-prototypes] 1575 \| int kvmppc_get_vmx_byte(struct kvm_vcpu vcpu, int index, u64 val) \| ^~~~~~~~~~~~~~~~~~~ Fixes: `acc9eb9305` ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with analyse_instr() input") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-19-clg@kaod.org	2021-01-30 11:39:31 +11:00
Linus Torvalds	8a5be36b93	powerpc updates for 5.11 - Switch to the generic C VDSO, as well as some cleanups of our VDSO setup/handling code. - Support for KUAP (Kernel User Access Prevention) on systems using the hashed page table MMU, using memory protection keys. - Better handling of PowerVM SMT8 systems where all threads of a core do not share an L2, allowing the scheduler to make better scheduling decisions. - Further improvements to our machine check handling. - Show registers when unwinding interrupt frames during stack traces. - Improvements to our pseries (PowerVM) partition migration code. - Several series from Christophe refactoring and cleaning up various parts of the 32-bit code. - Other smaller features, fixes & cleanups. Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König, Vincent Stehlé, Youling Tang, Zhang Xiaoxu. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl/bURITHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgEzBEAC1Vwibcog2P9rkJPb0q3UGWSYSx25V h/LwwxtM9Tm14j/LZsSgkOgIsfMaWEBIw/8D4efQ7AX9aFo+R0c2DdQMx1MG5MXz gZk58+l3LwId6h9+OrwurpEW+ZmURLAtGMSyFdkeiZ3/XTnkbf1XnewC0QWQe56a EGLmjx1MFl45jspoy7UIUXsXoNJIfflEKhrgUzSUh8X2eLmvB9ws6A4BXxbVzyZl lZv3+uWimU2pFgdkB9jOCxoG4zFEr2o5ovLHG7zCCVo5JoXmTPQ5cMVBraH206ms +5vCmu4qI8uP5UlZW/mZfhrtDiMdHdQqqFOaQwVlOmoUbU6L6E6rxm3iVnov2Bbi iUgxoeJDxAb2cM2EWFK6oWVgr7+NkwvXM1IG8xtprhHrCdnC9r+psQr/dswb3LSg MJ7u/RCq3uixy2kWP8E0NEHw7ngQZ/ZKPqzfnmIWOC7tYUxgaL02I8Ff9/ZXAI2J CnmqFYOjrimHkcwXGOtKkXNvfU0DiL97qpK2AQNWElE8+bUUmpw+ltUrsdSycYmv Afc4WIcVrTA+a9laSLgjdZbolbNSa3p+cMIYdPrVx9g+xqygbxIKv+EDGNv1WHfD GU1gmohMY+ZkMOvFRMi8LAsEm0DH/etWE0py/8uyxDYKnGyD1Ur6452DStkmgGb2 azmcaOyLdb+HXA== =Ga3K -----END PGP SIGNATURE----- Merge tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Switch to the generic C VDSO, as well as some cleanups of our VDSO setup/handling code. - Support for KUAP (Kernel User Access Prevention) on systems using the hashed page table MMU, using memory protection keys. - Better handling of PowerVM SMT8 systems where all threads of a core do not share an L2, allowing the scheduler to make better scheduling decisions. - Further improvements to our machine check handling. - Show registers when unwinding interrupt frames during stack traces. - Improvements to our pseries (PowerVM) partition migration code. - Several series from Christophe refactoring and cleaning up various parts of the 32-bit code. - Other smaller features, fixes & cleanups. Thanks to: Alan Modra, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Ard Biesheuvel, Athira Rajeev, Balamuruhan S, Bill Wendling, Cédric Le Goater, Christophe Leroy, Christophe Lombard, Colin Ian King, Daniel Axtens, David Hildenbrand, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geert Uytterhoeven, Giuseppe Sacco, Greg Kurz, Harish, Jan Kratochvil, Jordan Niethe, Kaixu Xia, Laurent Dufour, Leonardo Bras, Madhavan Srinivasan, Mahesh Salgaonkar, Mathieu Desnoyers, Nathan Lynch, Nicholas Piggin, Oleg Nesterov, Oliver O'Halloran, Oscar Salvador, Po-Hsu Lin, Qian Cai, Qinglang Miao, Randy Dunlap, Ravi Bangoria, Sachin Sant, Sandipan Das, Sebastian Andrzej Siewior , Segher Boessenkool, Srikar Dronamraju, Tyrel Datwyler, Uwe Kleine-König, Vincent Stehlé, Youling Tang, and Zhang Xiaoxu. * tag 'powerpc-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (304 commits) powerpc/32s: Fix cleanup_cpu_mmu_context() compile bug powerpc: Add config fragment for disabling -Werror powerpc/configs: Add ppc64le_allnoconfig target powerpc/powernv: Rate limit opal-elog read failure message powerpc/pseries/memhotplug: Quieten some DLPAR operations powerpc/ps3: use dma_mapping_error() powerpc: force inlining of csum_partial() to avoid multiple csum_partial() with GCC10 powerpc/perf: Fix Threshold Event Counter Multiplier width for P10 powerpc/mm: Fix hugetlb_free_pmd_range() and hugetlb_free_pud_range() KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp KVM: PPC: fix comparison to bool warning KVM: PPC: Book3S: Assign boolean values to a bool variable powerpc: Inline setup_kup() powerpc/64s: Mark the kuap/kuep functions non __init KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering powerpc/xive: Improve error reporting of OPAL calls powerpc/xive: Simplify xive_do_source_eoi() powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG ...	2020-12-17 13:34:25 -08:00
Leonardo Bras	87fb4978ef	KVM: PPC: Book3S HV: Fix mask size for emulated msgsndp According to ISAv3.1 and ISAv3.0b, the msgsndp is described to split RB in: msgtype <- (RB) 32:36 payload <- (RB) 37:63 t <- (RB) 57:63 The current way of getting 'msgtype', and 't' is missing their MSB: msgtype: ((arg >> 27) & 0xf) : Gets (RB) 33:36, missing bit 32 t: (arg &= 0x3f) : Gets (RB) 58:63, missing bit 57 Fixes this by applying the correct mask. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201208215707.31149-1-leobras.c@gmail.com	2020-12-15 22:22:06 +11:00
Kaixu Xia	a300bf8c5f	KVM: PPC: fix comparison to bool warning Fix the following coccicheck warning: ./arch/powerpc/kvm/booke.c:503:6-16: WARNING: Comparison to bool ./arch/powerpc/kvm/booke.c:505:6-17: WARNING: Comparison to bool ./arch/powerpc/kvm/booke.c:507:6-16: WARNING: Comparison to bool Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604764178-8087-1-git-send-email-kaixuxia@tencent.com	2020-12-15 22:22:06 +11:00
Kaixu Xia	13751f8747	KVM: PPC: Book3S: Assign boolean values to a bool variable Fix the following coccinelle warnings: ./arch/powerpc/kvm/book3s_xics.c:476:3-15: WARNING: Assignment of 0/1 to bool variable ./arch/powerpc/kvm/book3s_xics.c:504:3-15: WARNING: Assignment of 0/1 to bool variable Reported-by: Tosk Robot <tencent_os_robot@tencent.com> Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1604730382-5810-1-git-send-email-kaixuxia@tencent.com	2020-12-15 22:22:06 +11:00
Cédric Le Goater	dddc4ef92d	KVM: PPC: Book3S HV: XIVE: Add a comment regarding VP numbering When the XIVE resources are allocated at the HW level, the VP structures describing the vCPUs of a guest are distributed among the chips to optimize the PowerBUS usage. For best performance, the guest vCPUs can be pinned to match the VP structure distribution. Currently, the VP identifiers are deduced from the vCPU id using the kvmppc_pack_vcpu_id() routine which is not incorrect but not optimal either. It VSMT is used, the result is not continuous and the constraints on HW resources described above can not be met. Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-14-clg@kaod.org	2020-12-11 09:53:11 +11:00
Cédric Le Goater	cf58b74666	powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_EOI_FW This flag was used to support the P9 DD1 and we have stopped supporting this CPU when DD2 came out. See skiboot commit: https://github.com/open-power/skiboot/commit/0b0d15e3c170 Also, remove eoi handler which is now unused. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-11-clg@kaod.org	2020-12-11 09:53:10 +11:00
Cédric Le Goater	b5277d18c6	powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_MASK_FW This flag was used to support the PHB4 LSIs on P9 DD1 and we have stopped supporting this CPU when DD2 came out. See skiboot commit: https://github.com/open-power/skiboot/commit/0b0d15e3c170 Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-10-clg@kaod.org	2020-12-11 09:53:10 +11:00
Cédric Le Goater	4cc0e36df2	powerpc/xive: Remove P9 DD1 flag XIVE_IRQ_FLAG_SHIFT_BUG This flag was used to support the PHB4 LSIs on P9 DD1 and we have stopped supporting this CPU when DD2 came out. See skiboot commit: https://github.com/open-power/skiboot/commit/0b0d15e3c170 Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-9-clg@kaod.org	2020-12-11 09:53:10 +11:00
Cédric Le Goater	4f1c3f7b08	powerpc/xive: Rename XIVE_IRQ_NO_EOI to show its a flag This is a simple cleanup to identify easily all flags of the XIVE interrupt structure. The interrupts flagged with XIVE_IRQ_FLAG_NO_EOI are the escalations used to wake up vCPUs in KVM. They are handled very differently from the rest. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-3-clg@kaod.org	2020-12-11 09:34:07 +11:00
Cédric Le Goater	9898367500	KVM: PPC: Book3S HV: XIVE: Show detailed configuration in debug output This is useful to track allocation of the HW resources on per guest basis. Making sure IPIs are local to the chip of the vCPUs reduces rerouting between interrupt controllers and gives better performance in case of pinning. Checking the distribution of VP structures on the chips also helps in reducing PowerBUS traffic. [ clg: resurrected show_sources and reworked ouput ] Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201210171450.1933725-2-clg@kaod.org	2020-12-11 09:34:07 +11:00
Nicholas Piggin	e89a8ca94b	powerpc/64s: Remove MSR[ISF] bit No supported processor implements this mode. Setting the bit in MSR values can be a bit confusing (and would prevent the bit from ever being reused). Remove it. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201106045340.1935841-1-npiggin@gmail.com	2020-12-09 23:48:14 +11:00
Christophe Leroy	ff57698a96	powerpc: Fix update form addressing in inline assembly In several places, inline assembly uses the "%Un" modifier to enable the use of instruction with update form addressing, but the associated "<>" constraint is missing. As mentioned in previous patch, this fails with gcc 4.9, so "<>" can't be used directly. Use UPD_CONSTR macro everywhere %Un modifier is used. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62eab5ca595485c192de1765bdac099f633a21d0.1603358942.git.christophe.leroy@csgroup.eu	2020-12-04 22:13:19 +11:00
Aneesh Kumar K.V	9f378b9f00	KVM: PPC: BOOK3S: PR: Ignore UAMOR SPR With power7 and above we expect the cpu to support keys. The number of keys are firmware controlled based on device tree. PR KVM do not expose key details via device tree. Hence when running with PR KVM we do run with MMU_FTR_KEY support disabled. But we can still get updates on UAMOR. Hence ignore access to them and for mfstpr return 0 indicating no AMR/IAMR update is no allowed. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201127044424.40686-3-aneesh.kumar@linux.ibm.com	2020-12-04 01:01:24 +11:00
Nicholas Piggin	1d15ffdfc9	KVM: PPC: Book3S HV: Ratelimit machine check messages coming from guests A number of machine check exceptions are triggerable by the guest. Ratelimit these to avoid a guest flooding the host console and logs. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Use dedicated ratelimit state, not printk_ratelimit()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-5-npiggin@gmail.com	2020-12-04 01:01:23 +11:00
Nicholas Piggin	067c9f9c98	KVM: PPC: Book3S HV: Don't attempt to recover machine checks for FWNMI enabled guests Guests that can deal with machine checks would actually prefer the hypervisor not to try recover for them. For example if SLB multi-hits are recovered by the hypervisor by clearing the SLB then the guest will not be able to log the contents and debug its programming error. If guests don't register for FWNMI, they may not be so capable and so the hypervisor will continue to recover for those. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201128070728.825934-4-npiggin@gmail.com	2020-12-04 01:01:23 +11:00
Greg Kurz	f54db39fbe	KVM: PPC: Book3S HV: XIVE: Fix vCPU id sanity check Commit `062cfab706` ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") updated kvmppc_xive_vcpu_id_valid() in a way that allows userspace to trigger an assertion in skiboot and crash the host: [ 696.186248988,3] XIVE[ IC 08 ] eq_blk != vp_blk (0 vs. 1) for target 0x4300008c/0 [ 696.186314757,0] Assert fail: hw/xive.c:2370:0 [ 696.186342458,0] Aborting! xive-kvCPU 0043 Backtrace: S: 0000000031e2b8f0 R: 0000000030013840 .backtrace+0x48 S: 0000000031e2b990 R: 000000003001b2d0 ._abort+0x4c S: 0000000031e2ba10 R: 000000003001b34c .assert_fail+0x34 S: 0000000031e2ba90 R: 0000000030058984 .xive_eq_for_target.part.20+0xb0 S: 0000000031e2bb40 R: 0000000030059fdc .xive_setup_silent_gather+0x2c S: 0000000031e2bc20 R: 000000003005a334 .opal_xive_set_vp_info+0x124 S: 0000000031e2bd20 R: 00000000300051a4 opal_entry+0x134 --- OPAL call token: 0x8a caller R1: 0xc000001f28563850 --- XIVE maintains the interrupt context state of non-dispatched vCPUs in an internal VP structure. We allocate a bunch of those on startup to accommodate all possible vCPUs. Each VP has an id, that we derive from the vCPU id for efficiency: static inline u32 kvmppc_xive_vp(struct kvmppc_xive xive, u32 server) { return xive->vp_base + kvmppc_pack_vcpu_id(xive->kvm, server); } The KVM XIVE device used to allocate KVM_MAX_VCPUS VPs. This was limitting the number of concurrent VMs because the VP space is limited on the HW. Since most of the time, VMs run with a lot less vCPUs, commit `062cfab706` ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") gave the possibility for userspace to tune the size of the VP block through the KVM_DEV_XIVE_NR_SERVERS attribute. The check in kvmppc_pack_vcpu_id() was changed from cpu < KVM_MAX_VCPUS xive->kvm->arch.emul_smt_mode to cpu < xive->nr_servers * xive->kvm->arch.emul_smt_mode The previous check was based on the fact that the VP block had KVM_MAX_VCPUS entries and that kvmppc_pack_vcpu_id() guarantees that packed vCPU ids are below KVM_MAX_VCPUS. We've changed the size of the VP block, but kvmppc_pack_vcpu_id() has nothing to do with it and it certainly doesn't ensure that the packed vCPU ids are below xive->nr_servers. kvmppc_xive_vcpu_id_valid() might thus return true when the VM was configured with a non-standard VSMT mode, even if the packed vCPU id is higher than what we expect. We end up using an unallocated VP id, which confuses OPAL. The assert in OPAL is probably abusive and should be converted to a regular error that the kernel can handle, but we shouldn't really use broken VP ids in the first place. Fix kvmppc_xive_vcpu_id_valid() so that it checks the packed vCPU id is below xive->nr_servers, which is explicitly what we want. Fixes: `062cfab706` ("KVM: PPC: Book3S HV: XIVE: Make VP block size configurable") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/160673876747.695514.1809676603724514920.stgit@bahia.lan	2020-12-01 21:45:51 +11:00
Michael Ellerman	20fa40b147	Merge branch 'fixes' into next Merge our fixes branch, in particular to bring in the changes for the entry/uaccess flush.	2020-11-25 23:17:31 +11:00
Christophe Leroy	120c0518ec	powerpc: Replace RFI by rfi on book3s/32 and booke For book3s/32 and for booke, RFI is just an rfi. Only 40x has a non trivial RFI. CONFIG_PPC_RTAS is never selected by 40x platforms. Make it more explicit by replacing RFI by rfi wherever possible. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/b901ddfdeb8a0a3b7cb59999599cdfde1bbfe834.1604854583.git.christophe.leroy@csgroup.eu	2020-11-19 16:56:54 +11:00
Aneesh Kumar K.V	e80639405c	powerpc/mm: Update tlbiel loop on POWER10 With POWER10, single tlbiel instruction invalidates all the congruence class of the TLB and hence we need to issue only one tlbiel with SET=0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201007053305.232879-1-aneesh.kumar@linux.ibm.com	2020-11-19 14:50:15 +11:00

1 2 3 4 5 ...

2378 Commits