Add ARCH_HAS_KCOV to ARM64 config. To avoid potential crashes, disable
instrumentation of the files in arch/arm64/kvm/hyp/*.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The GICv3 backend of the vgic is quite barrier heavy, in order
to ensure synchronization of the system registers and the
memory mapped view for a potential GICv2 guest.
But when the guest is using a GICv3 model, there is absolutely
no need to execute all these heavy barriers, and it is actually
beneficial to avoid them altogether.
This patch makes the synchonization conditional, and ensures
that we do not change the EL1 SRE settings if we do not need to.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Both our GIC emulations are "strict", in the sense that we either
emulate a GICv2 or a GICv3, and not a GICv3 with GICv2 legacy
support.
But when running on a GICv3 host, we still allow the guest to
tinker with the ICC_SRE_EL1 register during its time slice:
it can switch SRE off, observe that it is off, and yet on the
next world switch, find the SRE bit to be set again. Not very
nice.
An obvious solution is to always trap accesses to ICC_SRE_EL1
(by clearing ICC_SRE_EL2.Enable), and to let the handler return
the programmed value on a read, or ignore the write.
That way, the guest can always observe that our GICv3 is SRE==1
only.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When we trap ICC_SRE_EL1, we handle it as RAZ/WI. It would be
more correct to actual make it RO, and return the configured
value when read.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When saving the state of the list registers, it is critical to
reset them zero, as we could otherwise leave unexpected EOI
interrupts pending for virtual level interrupts.
Cc: stable@vger.kernel.org # v4.6+
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(kvm_stat had nothing to do with QEMU in the first place -- the tool
only interprets debugfs)
- expose per-vm statistics in debugfs and support them in kvm_stat
(KVM always collected per-vm statistics, but they were summarised into
global statistics)
x86:
- fix dynamic APICv (VMX was improperly configured and a guest could
access host's APIC MSRs, CVE-2016-4440)
- minor fixes
ARM changes from Christoffer Dall:
"This set of changes include the new vgic, which is a reimplementation
of our horribly broken legacy vgic implementation. The two
implementations will live side-by-side (with the new being the
configured default) for one kernel release and then we'll remove the
legacy one.
Also fixes a non-critical issue with virtual abort injection to
guests."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCAAGBQJXRz0KAAoJEED/6hsPKofosiMIAIHmRI+9I6VMNmQe5vrZKz9/
vt89QGxDJrFQwhEuZovenLEDaY6rMIJNguyvIbPhNuXNHIIPWbe6cO6OPwByqkdo
WI/IIqcAJN/Bpwt4/Y2977A5RwDOwWLkaDs0LrZCEKPCgeh9GWQf+EfyxkDJClhG
uIgbSAU+t+7b05K3c6NbiQT/qCzDTCdl6In6PI/DFSRRkXDaTcopjjp1PmMUSSsR
AM8LGhEzMer+hGKOH7H5TIbN+HFzAPjBuDGcoZt0/w9IpmmS5OMd3ZrZ320cohz8
zZQooRcFrT0ulAe+TilckmRMJdMZ69fyw3nzfqgAKEx+3PaqjKSY/tiEgqqDJHY=
=EEBK
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second batch of KVM updates from Radim Krčmář:
"General:
- move kvm_stat tool from QEMU repo into tools/kvm/kvm_stat (kvm_stat
had nothing to do with QEMU in the first place -- the tool only
interprets debugfs)
- expose per-vm statistics in debugfs and support them in kvm_stat
(KVM always collected per-vm statistics, but they were summarised
into global statistics)
x86:
- fix dynamic APICv (VMX was improperly configured and a guest could
access host's APIC MSRs, CVE-2016-4440)
- minor fixes
ARM changes from Christoffer Dall:
- new vgic reimplementation of our horribly broken legacy vgic
implementation. The two implementations will live side-by-side
(with the new being the configured default) for one kernel release
and then we'll remove the legacy one.
- fix for a non-critical issue with virtual abort injection to guests"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (70 commits)
tools: kvm_stat: Add comments
tools: kvm_stat: Introduce pid monitoring
KVM: Create debugfs dir and stat files for each VM
MAINTAINERS: Add kvm tools
tools: kvm_stat: Powerpc related fixes
tools: Add kvm_stat man page
tools: Add kvm_stat vm monitor script
kvm:vmx: more complete state update on APICv on/off
KVM: SVM: Add more SVM_EXIT_REASONS
KVM: Unify traced vector format
svm: bitwise vs logical op typo
KVM: arm/arm64: vgic-new: Synchronize changes to active state
KVM: arm/arm64: vgic-new: enable build
KVM: arm/arm64: vgic-new: implement mapped IRQ handling
KVM: arm/arm64: vgic-new: Wire up irqfd injection
KVM: arm/arm64: vgic-new: Add vgic_v2/v3_enable
KVM: arm/arm64: vgic-new: vgic_init: implement map_resources
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_init
KVM: arm/arm64: vgic-new: vgic_init: implement vgic_create
KVM: arm/arm64: vgic-new: vgic_init: implement kvm_vgic_hyp_init
...
Now that the new VGIC implementation has reached feature parity with
the old one, add the new files to the build system and add a Kconfig
option to switch between the two versions.
We set the default to the new version to get maximum test coverage,
in case people experience problems they can switch back to the old
behaviour if needed.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
The EC field of the constructed ESR is conditionally modified by ORing in
ESR_ELx_EC_DABT_LOW for a data abort. However, ESR_ELx_EC_SHIFT is missing
from this condition.
Signed-off-by: Matt Evans <matt.evans@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
- x86: miscellaneous fixes, AVIC support (local APIC virtualization,
AMD version)
- s390: polling for interrupts after a VCPU goes to halted state is
now enabled for s390; use hardware provided information about facility
bits that do not need any hypervisor activity, and other fixes for
cpu models and facilities; improve perf output; floating interrupt
controller improvements.
- MIPS: miscellaneous fixes
- PPC: bugfixes only
- ARM: 16K page size support, generic firmware probing layer for
timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things outside
KVM, but they should all carry the necessary acks and it made the
merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com
"more formally and for documentation purposes".
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXPJjyAAoJEL/70l94x66DhioH/j4fwQ0FmfPSM9PArzaFHQdx
LNE3tU4+bobbsy1BJr4DiAaOUQn3DAgwUvGLWXdeLiOXtoWXBiFHKaxlqEsCA6iQ
xcTH1TgfxsVoqGQ6bT9X/2GCx70heYpcWG3f+zqBy7ZfFmQykLAC/HwOr52VQL8f
hUFi3YmTHcnorp0n5Xg+9r3+RBS4D/kTbtdn6+KCLnPJ0RcgNkI3/NcafTemoofw
Tkv8+YYFNvKV13qlIfVqxMa0GwWI3pP6YaNKhaS5XO8Pu16HuuF1JthJsUBDzwBa
RInp8R9MoXgsBYhLpz3jc9vWG7G9yDl5LehsD9KOUGOaFYJ7sQN+QZOusa6jFgA=
=llO5
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small release overall.
x86:
- miscellaneous fixes
- AVIC support (local APIC virtualization, AMD version)
s390:
- polling for interrupts after a VCPU goes to halted state is now
enabled for s390
- use hardware provided information about facility bits that do not
need any hypervisor activity, and other fixes for cpu models and
facilities
- improve perf output
- floating interrupt controller improvements.
MIPS:
- miscellaneous fixes
PPC:
- bugfixes only
ARM:
- 16K page size support
- generic firmware probing layer for timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things
outside KVM, but they should all carry the necessary acks and it
made the merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more
formally and for documentation purposes')"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits)
KVM: MTRR: remove MSR 0x2f8
KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same
svm: Manage vcpu load/unload when enable AVIC
svm: Do not intercept CR8 when enable AVIC
svm: Do not expose x2APIC when enable AVIC
KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore
svm: Add VMEXIT handlers for AVIC
svm: Add interrupt injection via AVIC
KVM: x86: Detect and Initialize AVIC support
svm: Introduce new AVIC VMCB registers
KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks
KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg
KVM: x86: Misc LAPIC changes to expose helper functions
KVM: shrink halt polling even more for invalid wakeups
KVM: s390: set halt polling to 80 microseconds
KVM: halt_polling: provide a way to qualify wakeups during poll
KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts
kvm: Conditionally register IRQ bypass consumer
...
- virt_to_page/page_address optimisations
- Support for NUMA systems described using device-tree
- Support for hibernate/suspend-to-disk
- Proper support for maxcpus= command line parameter
- Detection and graceful handling of AArch64-only CPUs
- Miscellaneous cleanups and non-critical fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJXNbgkAAoJELescNyEwWM0PtcIAK11xaOMmSqXz8fcTeNLw4dS
taaPWhjCYus8EhJyvTetfwk74+qVApdvKXKNKgODJXQEjeQx2brdUfbQZb31DTGT
798UYCAyEYCWkXspqi+/dpZEgUGPYH7uGOu2eDd19+PhTeX/EQSRX3fC9k0BNhvh
PN9pOgRcKAlIExZ6QYmT0g56VLtbCfFShN41mQ8HdpShl6pPJuhQ+kDDzudmRjuD
11/oYuOaVTnwbPuXn+sjOrWvMkfINHI70BAQnnBs0v+5c45mzpqEMsy0dYo2Pl2m
ar5lUFVIZggQkiqcOzqBzEgF+4gNw4LUu1DgK6cNKNMtL6k8E9zeOZMWeSVr0lg=
=bT5E
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
- virt_to_page/page_address optimisations
- support for NUMA systems described using device-tree
- support for hibernate/suspend-to-disk
- proper support for maxcpus= command line parameter
- detection and graceful handling of AArch64-only CPUs
- miscellaneous cleanups and non-critical fixes
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
arm64: do not enforce strict 16 byte alignment to stack pointer
arm64: kernel: Fix incorrect brk randomization
arm64: cpuinfo: Missing NULL terminator in compat_hwcap_str
arm64: secondary_start_kernel: Remove unnecessary barrier
arm64: Ensure pmd_present() returns false after pmd_mknotpresent()
arm64: Replace hard-coded values in the pmd/pud_bad() macros
arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
arm64: Fix typo in the pmdp_huge_get_and_clear() definition
arm64: mm: remove unnecessary EXPORT_SYMBOL_GPL
arm64: always use STRICT_MM_TYPECHECKS
arm64: kvm: Fix kvm teardown for systems using the extended idmap
arm64: kaslr: increase randomization granularity
arm64: kconfig: drop CONFIG_RTC_LIB dependency
arm64: make ARCH_SUPPORTS_DEBUG_PAGEALLOC depend on !HIBERNATION
arm64: hibernate: Refuse to hibernate if the boot cpu is offline
arm64: kernel: Add support for hibernate/suspend-to-disk
PM / Hibernate: Call flush_icache_range() on pages restored in-place
arm64: Add new asm macro copy_page
arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
arm64: kernel: Include _AC definition in page.h
...
The ARMv8.1 architecture extensions introduce support for hardware
updates of the access and dirty information in page table entries. With
VTCR_EL2.HA enabled (bit 21), when the CPU accesses an IPA with the
PTE_AF bit cleared in the stage 2 page table, instead of raising an
Access Flag fault to EL2 the CPU sets the actual page table entry bit
(10). To ensure that kernel modifications to the page table do not
inadvertently revert a bit set by hardware updates, certain Stage 2
software pte/pmd operations must be performed atomically.
The main user of the AF bit is the kvm_age_hva() mechanism. The
kvm_age_hva_handler() function performs a "test and clear young" action
on the pte/pmd. This needs to be atomic in respect of automatic hardware
updates of the AF bit. Since the AF bit is in the same position for both
Stage 1 and Stage 2, the patch reuses the existing
ptep_test_and_clear_young() functionality if
__HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG is defined. Otherwise, the
existing pte_young/pte_mkold mechanism is preserved.
The kvm_set_s2pte_readonly() (and the corresponding pmd equivalent) have
to perform atomic modifications in order to avoid a race with updates of
the AF bit. The arm64 implementation has been re-written using
exclusives.
Currently, kvm_set_s2pte_writable() (and pmd equivalent) take a pointer
argument and modify the pte/pmd in place. However, these functions are
only used on local variables rather than actual page table entries, so
it makes more sense to follow the pte_mkwrite() approach for stage 1
attributes. The change to kvm_s2pte_mkwrite() makes it clear that these
functions do not modify the actual page table entries.
The (pte|pmd)_mkyoung() uses on Stage 2 entries (setting the AF bit
explicitly) do not need to be modified since hardware updates of the
dirty status are not supported by KVM, so there is no possibility of
losing such information.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
If memory is located above 1<<VA_BITS, kvm adds an extra level to its page
tables, merging the runtime tables and boot tables that contain the idmap.
This lets us avoid the trampoline dance during initialisation.
This also means there is no trampoline page mapped, so
__cpu_reset_hyp_mode() can't call __kvm_hyp_reset() in this page. The good
news is the idmap is still mapped, so we don't need the trampoline page.
The bad news is we can't call it directly as the idmap is above
HYP_PAGE_OFFSET, so its address is masked by kvm_call_hyp.
Add a function __extended_idmap_trampoline which will branch into
__kvm_hyp_reset in the idmap, change kvm_hyp_reset_entry() to return
this address if __kvm_cpu_uses_extended_idmap(). In this case
__kvm_hyp_reset() will still switch to the boot tables (which are the
merged tables that were already in use), and branch into the idmap (where
it already was).
This fixes boot failures on these systems, where we fail to execute the
missing trampoline page when tearing down kvm in init_subsystems():
[ 2.508922] kvm [1]: 8-bit VMID
[ 2.512057] kvm [1]: Hyp mode initialized successfully
[ 2.517242] kvm [1]: interrupt-controller@e1140000 IRQ13
[ 2.522622] kvm [1]: timer IRQ3
[ 2.525783] Kernel panic - not syncing: HYP panic:
[ 2.525783] PS:200003c9 PC:0000007ffffff820 ESR:86000005
[ 2.525783] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
[ 2.525783] VCPU: (null)
[ 2.525783]
[ 2.547667] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.6.0-rc5+ #1
[ 2.555137] Hardware name: Default string Default string/Default string, BIOS ROD0084E 09/03/2015
[ 2.563994] Call trace:
[ 2.566432] [<ffffff80080888d0>] dump_backtrace+0x0/0x240
[ 2.571818] [<ffffff8008088b24>] show_stack+0x14/0x20
[ 2.576858] [<ffffff80083423ac>] dump_stack+0x94/0xb8
[ 2.581899] [<ffffff8008152130>] panic+0x10c/0x250
[ 2.586677] [<ffffff8008152024>] panic+0x0/0x250
[ 2.591281] SMP: stopping secondary CPUs
[ 3.649692] SMP: failed to stop secondary CPUs 0-2,4-7
[ 3.654818] Kernel Offset: disabled
[ 3.658293] Memory Limit: none
[ 3.661337] ---[ end Kernel panic - not syncing: HYP panic:
[ 3.661337] PS:200003c9 PC:0000007ffffff820 ESR:86000005
[ 3.661337] FAR:0000007ffffff820 HPFAR:00000000003ffff0 PAR:0000000000000000
[ 3.661337] VCPU: (null)
[ 3.661337]
Reported-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The current kvm implementation on arm64 does cpu-specific initialization
at system boot, and has no way to gracefully shutdown a core in terms of
kvm. This prevents kexec from rebooting the system at EL2.
This patch adds a cpu tear-down function and also puts an existing cpu-init
code into a separate function, kvm_arch_hardware_disable() and
kvm_arch_hardware_enable() respectively.
We don't need the arm64 specific cpu hotplug hook any more.
Since this patch modifies common code between arm and arm64, one stub
definition, __cpu_reset_hyp_mode(), is added on arm side to avoid
compilation errors.
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
[Rebase, added separate VHE init/exit path, changed resets use of
kvm_call_hyp() to the __version, en/disabled hardware in init_subsystems(),
added icache maintenance to __kvm_hyp_reset() and removed lr restore, removed
guest-enter after teardown handling]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
A later patch implements kvm_arch_hardware_disable(), to remove kvm
from el2, and re-instate the hyp-stub.
This can happen while guests are running, particularly when kvm_reboot()
calls kvm_arch_hardware_disable() on each cpu. This can interrupt a guest,
remove kvm, then allow the guest to be scheduled again. This causes
kvm_call_hyp() to be run against the hyp-stub.
Change the hyp-stub to return a new exception type when this happens,
and add code to kvm's handle_exit() to tell userspace we failed to
enter the guest.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The existing arm64 hcall implementations are limited in that they only
allow for two distinct hcalls; with the x0 register either zero or not
zero. Also, the API of the hyp-stub exception vector routines and the
KVM exception vector routines differ; hyp-stub uses a non-zero value in
x0 to implement __hyp_set_vectors, whereas KVM uses it to implement
kvm_call_hyp.
To allow for additional hcalls to be defined and to make the arm64 hcall
API more consistent across exception vector routines, change the hcall
implementations to reserve all x0 values below 0xfff for hcalls such
as {s,g}et_vectors().
Define two new preprocessor macros HVC_GET_VECTORS, and HVC_SET_VECTORS
to be used as hcall type specifiers and convert the existing
__hyp_get_vectors() and __hyp_set_vectors() routines to use these new
macros when executing an HVC call. Also, change the corresponding
hyp-stub and KVM el1_sync exception vector routines to use these new
macros.
Signed-off-by: Geoff Levand <geoff@infradead.org>
[Merged two hcall patches, moved immediate value from esr to x0, use lr
as a scratch register, changed limit to 0xfff]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Today the 'hvc' calling KVM or the hyp-stub is expected to preserve all
registers. KVM saves/restores the registers it needs on the EL2 stack using
do_el2_call(). The hyp-stub has no stack, later patches need to be able to
be able to clobber the link register.
Move the link register save/restore to the the call sites.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
We currently have macros defining flags for the arm64 sctlr registers in
both kvm_arm.h and sysreg.h. To clean things up and simplify move the
definitions of the SCTLR_EL2 flags from kvm_arm.h to sysreg.h, rename any
SCTLR_EL1 or SCTLR_EL2 flags that are common to both registers to be
SCTLR_ELx, with 'x' indicating a common flag, and fixup all files to
include the proper header or to use the new macro names.
Signed-off-by: Geoff Levand <geoff@infradead.org>
[Restored pgtable-hwdef.h include]
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that we can handle stage-2 page tables independent
of the host page table levels, wire up the 16K page
support.
Cc: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
We always thought that 40bits of PA range would be the minimum people
would actually build. Anything less is terrifyingly small.
Turns out that we were both right and wrong. Nobody has ever built
such a system, but the ARM Foundation Model has a PARange set to 36bits.
Just because we can. Oh well. Now, the KVM API explicitely says that
we offer a 40bit PA space to the VM, so we shouldn't run KVM on
the Foundation Model at all.
That being said, this patch offers a less agressive alternative, and
loudly warns about the configuration being unsupported. You'll still
be able to run VMs (at your own risks, though).
This is just a workaround until we have a proper userspace API where
we report the PARange to userspace.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When we detect support for 16bit VMID in ID_AA64MMFR1, we set the
VTCR_EL2_VS field to 1 to make use of 16bit vmids. But, with
commit 3a3604bc5e ("arm64: KVM: Switch to C-based stage2 init")
this is broken and we corrupt VTCR_EL2:T0SZ instead of updating the VS
field. VTCR_EL2_VS was actually defined to the field shift (19) and
not the real value for VS. This patch fixes the issue.
Fixes: commit 3a3604bc5e ("arm64: KVM: Switch to C-based stage2 init")
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
With the recent rewrite of the arm64 KVM hypervisor code in C, enabling
certain options like KASAN would allow the compiler to generate memory
accesses or function calls to addresses not mapped at EL2. This patch
disables the compiler instrumentation on the arm64 hypervisor code for
gcov-based profiling (GCOV_KERNEL), undefined behaviour sanity checker
(UBSAN) and kernel address sanitizer (KASAN).
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: <stable@vger.kernel.org> # 4.5+
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture requires
break-before-make in such cases to avoid TLB conflicts but that's not
always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked to
the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of
the vmalloc space, allowing the kernel to be loaded (nearly) anywhere
in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is provided
by UEFI (efi_get_random_bytes() patches merged via the arm64 tree,
acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but
actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this allows
uaccess functions (get_user etc.) to be implemented using LDTR/STTR
instructions. Such instructions, when run by the kernel, perform
unprivileged accesses adding an extra level of protection. The
set_fs() macro is used to "upgrade" such instruction to privileged
accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the sigcontext
information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW6u95AAoJEGvWsS0AyF7xMyoP/3x2O6bgreSQ84BdO4JChN4+
RQ9OVdX8u2ItO9sgaCY2AA6KoiBuEjGmPl/XRuK0I7DpODTtRjEXQHuNNhz8AelC
hn4AEVqamY6Z5BzHFIjs8G9ydEbq+OXcKWEdwSsBhP/cMvI7ss3dps1f5iNPT5Vv
50E/kUz+aWYy7pKlB18VDV7TUOA3SuYuGknWV8+bOY5uPb8hNT3Y3fHOg/EuNNN3
DIuYH1V7XQkXtF+oNVIGxzzJCXULBE7egMcWAm1ydSOHK0JwkZAiL7OhI7ceVD0x
YlDxBnqmi4cgzfBzTxITAhn3OParwN6udQprdF1WGtFF6fuY2eRDSH/L/iZoE4DY
OulL951OsBtF8YC3+RKLk908/0bA2Uw8ftjCOFJTYbSnZBj1gWK41VkCYMEXiHQk
EaN8+2Iw206iYIoyvdjGCLw7Y0oakDoVD9vmv12SOaHeQljTkjoN8oIlfjjKTeP7
3AXj5v9BDMDVh40nkVayysRNvqe48Kwt9Wn0rhVTLxwdJEiFG/OIU6HLuTkretdN
dcCNFSQrRieSFHpBK9G0vKIpIss1ZwLm8gjocVXH7VK4Mo/TNQe4p2/wAF29mq4r
xu1UiXmtU3uWxiqZnt72LOYFCarQ0sFA5+pMEvF5W+NrVB0wGpXhcwm+pGsIi4IM
LepccTgykiUBqW5TRzPz
=/oS+
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Here are the main arm64 updates for 4.6. There are some relatively
intrusive changes to support KASLR, the reworking of the kernel
virtual memory layout and initial page table creation.
Summary:
- Initial page table creation reworked to avoid breaking large block
mappings (huge pages) into smaller ones. The ARM architecture
requires break-before-make in such cases to avoid TLB conflicts but
that's not always possible on live page tables
- Kernel virtual memory layout: the kernel image is no longer linked
to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
of the vmalloc space, allowing the kernel to be loaded (nearly)
anywhere in physical RAM
- Kernel ASLR: position independent kernel Image and modules being
randomly mapped in the vmalloc space with the randomness is
provided by UEFI (efi_get_random_bytes() patches merged via the
arm64 tree, acked by Matt Fleming)
- Implement relative exception tables for arm64, required by KASLR
(initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
but actual x86 conversion to deferred to 4.7 because of the merge
dependencies)
- Support for the User Access Override feature of ARMv8.2: this
allows uaccess functions (get_user etc.) to be implemented using
LDTR/STTR instructions. Such instructions, when run by the kernel,
perform unprivileged accesses adding an extra level of protection.
The set_fs() macro is used to "upgrade" such instruction to
privileged accesses via the UAO bit
- Half-precision floating point support (part of ARMv8.2)
- Optimisations for CPUs with or without a hardware prefetcher (using
run-time code patching)
- copy_page performance improvement to deal with 128 bytes at a time
- Sanity checks on the CPU capabilities (via CPUID) to prevent
incompatible secondary CPUs from being brought up (e.g. weird
big.LITTLE configurations)
- valid_user_regs() reworked for better sanity check of the
sigcontext information (restored pstate information)
- ACPI parking protocol implementation
- CONFIG_DEBUG_RODATA enabled by default
- VDSO code marked as read-only
- DEBUG_PAGEALLOC support
- ARCH_HAS_UBSAN_SANITIZE_ALL enabled
- Erratum workaround Cavium ThunderX SoC
- set_pte_at() fix for PROT_NONE mappings
- Code clean-ups"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Fix misspellings in comments.
arm64: efi: add missing frame pointer assignment
arm64: make mrs_s prefixing implicit in read_cpuid
arm64: enable CONFIG_DEBUG_RODATA by default
arm64: Rework valid_user_regs
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: KVM: Move kvm_call_hyp back to its original localtion
arm64: mm: treat memstart_addr as a signed quantity
arm64: mm: list kernel sections in order
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
arm64: kconfig: add submenu for 8.2 architectural features
arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
arm64: Add support for Half precision floating point
arm64: Remove fixmap include fragility
arm64: Add workaround for Cavium erratum 27456
arm64: mm: Mark .rodata as RO
...
but lots of architecture-specific changes.
* ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.
* PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).
* s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.
* x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using vector
hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest memory---currently
its only use is to speedup the legacy shadow paging (pre-EPT) case, but
in the future it will be used for virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
=OUZZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"One of the largest releases for KVM... Hardly any generic
changes, but lots of architecture-specific updates.
ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.
PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).
s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.
x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using
vector hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest
memory - currently its only use is to speedup the legacy shadow
paging (pre-EPT) case, but in the future it will be used for
virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
KVM: x86: disable MPX if host did not enable MPX XSAVE features
arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
arm64: KVM: vgic-v3: Reset LRs at boot time
arm64: KVM: vgic-v3: Do not save an LR known to be empty
arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
arm64: KVM: vgic-v3: Avoid accessing ICH registers
KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
KVM: arm/arm64: vgic-v2: Reset LRs at boot time
KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
KVM: s390: allocate only one DMA page per VM
KVM: s390: enable STFLE interpretation only if enabled for the guest
KVM: s390: wake up when the VCPU cpu timer expires
KVM: s390: step the VCPU timer while in enabled wait
KVM: s390: protect VCPU cpu timer with a seqcount
KVM: s390: step VCPU cpu timer during kvm_run ioctl
...
So far, we're always writing all possible LRs, setting the empty
ones with a zero value. This is obvious doing a low of work for
nothing, and we're better off clearing those we've actually
dirtied on the exit path (it is very rare to inject more than one
interrupt at a time anyway).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to let the GICv3 code be more lazy in the way it
accesses the LRs, it is necessary to start with a clean slate.
Let's reset the LRs on each CPU when the vgic is probed (which
includes a round trip to EL2...).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
On exit, any empty LR will be signaled in ICH_ELRSR_EL2. Which
means that we do not have to save it, and we can just clear
its state in the in-memory copy.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Next on our list of useless accesses is the maintenance interrupt
status registers (ICH_MISR_EL2, ICH_EISR_EL2).
It is pointless to save them if we haven't asked for a maintenance
interrupt the first place, which can only happen for two reasons:
- Underflow: ICH_HCR_UIE will be set,
- EOI: ICH_LR_EOI will be set.
These conditions can be checked on the in-memory copies of the regs.
Should any of these two condition be valid, we must read GICH_MISR.
We can then check for ICH_MISR_EOI, and only when set read
ICH_EISR_EL2.
This means that in most case, we don't have to save them at all.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Just like on GICv2, we're a bit hammer-happy with GICv3, and access
them more often than we should.
Adopt a policy similar to what we do for GICv2, only save/restoring
the minimal set of registers. As we don't access the registers
linearly anymore (we may skip some), the convoluted accessors become
slightly simpler, and we can drop the ugly indexing macro that
tended to confuse the reviewers.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Our 64bit sys_reg table is about 90 entries long (so far, and the
PMU support is likely to increase this). This means that on average,
it takes 45 comparaisons to find the right entry (and actually the
full 90 if we have to search the invariant table).
Not the most efficient thing. Specially when you think that this
table is already sorted. Switching to a binary search effectively
reduces the search to about 7 comparaisons. Slightly better!
As an added bonus, the comparison is done by comparing all the
fields at once, instead of one at a time.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
To configure the virtual PMUv3 overflow interrupt number, we use the
vcpu kvm_device ioctl, encapsulating the KVM_ARM_VCPU_PMU_V3_IRQ
attribute within the KVM_ARM_VCPU_PMU_V3_CTRL group.
After configuring the PMUv3, call the vcpu ioctl with attribute
KVM_ARM_VCPU_PMU_V3_INIT to initialize the PMUv3.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In some cases it needs to get/set attributes specific to a vcpu and so
needs something else than ONE_REG.
Let's copy the KVM_DEVICE approach, and define the respective ioctls
for the vcpu file descriptor.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
To support guest PMUv3, use one bit of the VCPU INIT feature array.
Initialize the PMU when initialzing the vcpu with that bit and PMU
overflow interrupt set.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When resetting vcpu, it needs to reset the PMU state to initial status.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This register resets as unknown in 64bit mode while it resets as zero
in 32bit mode. Here we choose to reset it as zero for consistency.
PMUSERENR_EL0 holds some bits which decide whether PMU registers can be
accessed from EL0. Add some check helpers to handle the access from EL0.
When these bits are zero, only reading PMUSERENR will trap to EL2 and
writing PMUSERENR or reading/writing other PMU registers will trap to
EL1 other than EL2 when HCR.TGE==0. To current KVM configuration
(HCR.TGE==0) there is no way to get these traps. Here we write 0xf to
physical PMUSERENR register on VM entry, so that it will trap PMU access
from EL0 to EL2. Within the register access handler we check the real
value of guest PMUSERENR register to decide whether this access is
allowed. If not allowed, return false to inject UND to guest.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
According to ARMv8 spec, when writing 1 to PMCR.E, all counters are
enabled by PMCNTENSET, while writing 0 to PMCR.E, all counters are
disabled. When writing 1 to PMCR.P, reset all event counters, not
including PMCCNTR, to zero. When writing 1 to PMCR.C, reset PMCCNTR to
zero.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add access handler which emulates writing and reading PMSWINC
register and add support for creating software increment event.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Since the reset value of PMOVSSET and PMOVSCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMOVSSET or PMOVSCLR register.
When writing non-zero value to PMOVSSET, the counter and its interrupt
is enabled, kick this vcpu to sync PMU interrupt.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Since the reset value of PMINTENSET and PMINTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMINTENSET or PMINTENCLR register.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
These kind of registers include PMEVTYPERn, PMCCFILTR and PMXEVTYPER
which is mapped to PMEVTYPERn or PMCCFILTR.
The access handler translates all aarch32 register offsets to aarch64
ones and uses vcpu_sys_reg() to access their values to avoid taking care
of big endian.
When writing to these registers, create a perf_event for the selected
event type.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Since the reset value of PMCNTENSET and PMCNTENCLR is UNKNOWN, use
reset_unknown for its reset handler. Add a handler to emulate writing
PMCNTENSET or PMCNTENCLR register.
When writing to PMCNTENSET, call perf_event_enable to enable the perf
event. When writing to PMCNTENCLR, call perf_event_disable to disable
the perf event.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
These kind of registers include PMEVCNTRn, PMCCNTR and PMXEVCNTR which
is mapped to PMEVCNTRn.
The access handler translates all aarch32 register offsets to aarch64
ones and uses vcpu_sys_reg() to access their values to avoid taking care
of big endian.
When reading these registers, return the sum of register value and the
value perf event counts.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add access handler which gets host value of PMCEID0 or PMCEID1 when
guest access these registers. Writing action to PMCEID0 or PMCEID1 is
UNDEFINED.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Since the reset value of PMSELR_EL0 is UNKNOWN, use reset_unknown for
its reset handler. When reading PMSELR, return the PMSELR.SEL field to
guest.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Add reset handler which gets host value of PMCR_EL0 and make writable
bits architecturally UNKNOWN except PMCR.E which is zero. Add an access
handler for PMCR.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Here we plan to support virtual PMU for guest by full software
emulation, so define some basic structs and functions preparing for
futher steps. Define struct kvm_pmc for performance monitor counter and
struct kvm_pmu for performance monitor unit for each vcpu. According to
ARMv8 spec, the PMU contains at most 32(ARMV8_PMU_MAX_COUNTERS)
counters.
Since this only supports ARM64 (or PMUv3), add a separate config symbol
for it.
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We already have virt/kvm/arm/ containing timer and vgic stuff.
Add yet another subdirectory to contain the hyp-specific files
(timer and vgic again).
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to be able to move code outside of kvm/hyp, we need to make
the global hyp.h file accessible from a standard location.
include/asm/kvm_hyp.h seems good enough.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The fault decoding process (including computing the IPA in the case
of a permission fault) would be much better done in C code, as we
have a reasonable infrastructure to deal with the VHE/non-VHE
differences.
Let's move the whole thing to C, including the workaround for
erratum 834220, and just patch the odd ESR_EL2 access remaining
in hyp-entry.S.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As the kernel fully runs in HYP when VHE is enabled, we can
directly branch to the kernel's panic() implementation, and
not perform an exception return.
Add the alternative code to deal with this.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Despite the fact that a VHE enabled kernel runs at EL2, it uses
CPACR_EL1 to trap FPSIMD access. Add the required alternative
code to re-enable guest FPSIMD access when it has trapped to
EL2.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Switch the timer code to the unified sysreg accessors.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Running the kernel in HYP mode requires the HCR_E2H bit to be set
at all times, and the HCR_TGE bit to be set when running as a host
(and cleared when running as a guest). At the same time, the vector
must be set to the current role of the kernel (either host or
hypervisor), and a couple of system registers differ between VHE
and non-VHE.
We implement these by using another set of alternate functions
that get dynamically patched.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As non-VHE and VHE have different ways to express the trapping of
FPSIMD registers to EL2, make __fpsimd_enabled a patchable predicate
and provide a VHE implementation.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We're now in a position where we can introduce VHE's minimal
save/restore, which is limited to the handful of shared sysregs.
Add the required alternative function calls that result in a
"do nothing" call on VHE, and the normal save/restore for non-VHE.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Use the recently introduced unified system register accessors for
those sysregs that behave differently depending on VHE being in
use or not.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
A handful of system registers are still shared between host and guest,
even while using VHE (tpidr*_el[01] and actlr_el1).
Also, some of the vcpu state (sp_el0, PC and PSTATE) must be
save/restored on entry/exit, as they are used on the host as well.
In order to facilitate the introduction of a VHE-specific sysreg
save/restore, make move the access to these registers to their
own save/restore functions.
No functional change.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
With ARMv8, host and guest share the same system register file,
making the save/restore procedure completely symetrical.
With VHE, host and guest now have different requirements, as they
use different sysregs.
In order to prepare for this, add split sysreg save/restore functions
for both host and guest. No functional changes yet.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
VHE brings its own bag of new system registers, or rather system
register accessors, as it define new ways to access both guest
and host system registers. For example, from the host:
- The host TCR_EL2 register is accessed using the TCR_EL1 accessor
- The guest TCR_EL1 register is accessed using the TCR_EL12 accessor
Obviously, this is confusing. A way to somehow reduce the complexity
of writing code for both ARMv8 and ARMv8.1 is to use a set of unified
accessors that will generate the right sysreg, depending on the mode
the CPU is running in. For example:
- read_sysreg_el1(tcr) will use TCR_EL1 on ARMv8, and TCR_EL12 on
ARMv8.1 with VHE.
- read_sysreg_el2(tcr) will use TCR_EL2 on ARMv8, and TCR_EL1 on
ARMv8.1 with VHE.
We end up with three sets of accessors ({read,write}_sysreg_el[012])
that can be directly used from C code. We take this opportunity to
also add the definition for the new VHE sysregs.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The kern_hyp_va macro is pretty meaninless with VHE, as there is
only one mapping - the kernel one.
In order to keep the code readable and efficient, use runtime
patching to replace the 'and' instruction used to compute the VA
with a 'nop'.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
With VHE, the host never issues an HVC instruction to get into the
KVM code, as we can simply branch there.
Use runtime code patching to simplify things a bit.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
There is no real need to leave the stage2 initialization as part
of the early HYP bootstrap, and we can easily postpone it to
the point where we can safely run C code.
This will help VHE, which doesn't need any of this bootstrap.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Calling return copy_to_user(...) in an ioctl will not
do the right thing if there's a pagefault:
copy_to_user returns the number of bytes not copied
in this case.
Fix up kvm to do
return copy_to_user(...)) ? -EFAULT : 0;
everywhere.
Cc: stable@vger.kernel.org
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Now that we have a clear understanding of the sign of a feature,
rename the routines to reflect the sign, so that it is not misused.
The cpuid_feature_extract_field() now accepts a 'sign' parameter.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The GICv3 architecture spec says:
Writing to the active priority registers in any order other than
the following order will result in UNPREDICTABLE behavior:
- ICH_AP0R<n>_EL2.
- ICH_AP1R<n>_EL2.
So let's not pointlessly go against the rule...
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
KVM on arm64 uses a fixed offset between the linear mapping at EL1 and
the HYP mapping at EL2. Before we can move the kernel virtual mapping
out of the linear mapping, we have to make sure that references to kernel
symbols that are accessed via the HYP mapping are translated to their
linear equivalent.
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Currently, using BUG_ON() in header files is cumbersome, due to the fact
that asm/bug.h transitively includes a lot of other header files, resulting
in the actual BUG_ON() invocation appearing before its definition in the
preprocessor input. So let's reverse the #include dependency between
asm/bug.h and asm/debug-monitors.h, by moving the definition of BUG_BRK_IMM
from the latter to the former. Also fix up one user of asm/debug-monitors.h
which relied on a transitive include.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Setting TCR_EL2.PS to 40 bits is wrong on systems with less that
less than 40 bits of physical addresses. and breaks KVM on systems
where the RAM is above 40 bits.
This patch uses ID_AA64MMFR0_EL1.PARange to set TCR_EL2.PS dynamically,
just like we already do for VTCR_EL2.PS.
[Marc: rewrote commit message, patch tidy up]
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Currently emulate_cp will return 0 (Handled) no matter what the accessor
returns. If register accessor returns false, it will not skip current PC
while emulate_cp return handled. Then guest will stuck in a dead loop.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Make sure the documentation reflects the actual name of the functions.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Some bits in CPTR are defined as RES1 in the architecture. Setting
these bits to zero may unintentionally enable future architecture
extensions, allowing guests to use them without supervision by the host.
This would be bad: for forwards compatibility, this patch makes
sure the affected bits are always written with 1, not 0.
This patch only addresses CPTR_EL2. Initialisation of other system
registers may still need review.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
At the moment, our fault injection is pretty limited. We always
generate a SYNC exception into EL1, as if the fault was actually
from EL1h, no matter how it was generated.
This is obviously wrong, as EL0 can generate faults of its own
(not to mention the pretty-much unused EL1t mode).
This patch fixes it by implementing section D1.10.2 of the ARMv8 ARM,
and in particular table D1-7 ("Vector offsets from vector table base
address"), which describes which vector to use depending on the source
exception level and type (synchronous, IRQ, FIQ or SError).
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Shannon Zhao <shannon.zhao@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The ARMv8.1 architecture extension allows to choose between 8-bit and
16-bit of VMID, so use this capability for KVM.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The debug trapping code is pretty heavy on the "inline" attribute,
but most functions are actually referenced in the sysreg tables,
making the inlining imposible.
Removing the useless inline qualifier seems the right thing to do,
having verified that the output code is similar.
Cc: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we've now switched to the new world switch implementation,
remove the weak attributes, as nobody is supposed to override
it anymore.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Having the system register numbers as #defines has been a pain
since day one, as the ordering is pretty fragile, and moving
things around leads to renumbering and epic conflict resolutions.
Now that we're mostly acessing the sysreg file in C, an enum is
a much better type to use, and we can clean things up a bit.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
This is it. We remove all of the code that has now been rewritten.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
So far, we've implemented the new world switch with a completely
different namespace, so that we could have both implementation
compiled in.
Let's take things one step further by adding weak aliases that
have the same names as the original implementation. The weak
attributes allows the new implementation to be overriden by the
old one, and everything still work.
At a later point, we'll be able to simply drop the old code, and
everything will hopefully keep working, thanks to the aliases we
have just added. This also saves us repainting all the callers.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Add the panic handler, together with the small bits of assembly
code to call the kernel's panic implementation.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Add the entry points for HYP mode (both for hypercalls and
exception handling).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Implement the TLB handling as a direct translation of the assembly
code version.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Implement the fpsimd save restore, keeping the lazy part in
assembler (as returning to C would be overkill).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Implement the core of the world switch in C. Not everything is there
yet, and there is nothing to re-enter the world switch either.
But this already outlines the code structure well enough.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
KVM so far relies on code patching, and is likely to use it more
in the future. The main issue is that our alternative system works
at the instruction level, while we'd like to have alternatives at
the function level.
In order to cope with this, add the "hyp_alternate_select" macro that
outputs a brief sequence of code that in turn can be patched, allowing
an alternative function to be selected.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Contrary to the previous patch, the guest entry is fairly different
from its assembly counterpart, mostly because it is only concerned
with saving/restoring the GP registers, and nothing else.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Implement the debug save restore as a direct translation of
the assembly code version.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Implement the 32bit system register save/restore as a direct
translation of the assembly code version.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Implement the system register save/restore as a direct translation of
the assembly code version.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
In order to expose the various EL2 services that are private to
the hypervisor, add a new hyp.h file.
So far, it only contains mundane things such as section annotation
and VA manipulation.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
It would add guest exit statistics to debugfs, this can be helpful
while measuring KVM performance.
[ Renamed some of the field names - Christoffer ]
Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Using oldstyle vcpu_reg() accessor is proven to be inappropriate and
unsafe on ARM64. This patch converts the rest of use cases to new
accessors and completely removes vcpu_reg() on ARM64.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
System register accesses also use zero register for Rt == 31, and
therefore using it will also result in getting SP value instead. This
patch makes them also using new accessors, introduced by the previous
patch. Since register value is no longer directly associated with storage
inside vCPU context structure, we introduce a dedicated storage for it in
struct sys_reg_params.
This refactor also gets rid of "massive hack" in kvm_handle_cp_64().
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Further rework is going to introduce a dedicated storage for transfer
register value in struct sys_reg_params. Before doing this we have to
remove 'const' modifiers from it in all accessor functions and their
callers.
Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
If we call __kvm_hyp_panic while a guest context is active, we call
__restore_sysregs before acquiring the system register values for the
panic, in the process throwing away the PAR_EL1 value at the point of
the panic.
This patch modifies __kvm_hyp_panic to stash the PAR_EL1 value prior to
restoring host register values, enabling us to report the original
values at the point of the panic.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Currently __kvm_hyp_panic uses %p for values which are not pointers,
such as the ESR value. This can confusingly lead to "(null)" being
printed for the value.
Use %x instead, and only use %p for host pointers.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Cortex-A57 parts up to r1p2 can misreport Stage 2 translation faults
when a Stage 1 permission fault or device alignment fault should
have been reported.
This patch implements the workaround (which is to validate that the
Stage-1 translation actually succeeds) by using code patching.
Cc: stable@vger.kernel.org
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
When running a 32bit guest under a 64bit hypervisor, the ARMv8
architecture defines a mapping of the 32bit registers in the 64bit
space. This includes banked registers that are being demultiplexed
over the 64bit ones.
On exceptions caused by an operation involving a 32bit register, the
HW exposes the register number in the ESR_EL2 register. It was so
far understood that SW had to distinguish between AArch32 and AArch64
accesses (based on the current AArch32 mode and register number).
It turns out that I misinterpreted the ARM ARM, and the clue is in
D1.20.1: "For some exceptions, the exception syndrome given in the
ESR_ELx identifies one or more register numbers from the issued
instruction that generated the exception. Where the exception is
taken from an Exception level using AArch32 these register numbers
give the AArch64 view of the register."
Which means that the HW is already giving us the translated version,
and that we shouldn't try to interpret it at all (for example, doing
an MMIO operation from the IRQ mode using the LR register leads to
very unexpected behaviours).
The fix is thus not to perform a call to vcpu_reg32() at all from
vcpu_reg(), and use whatever register number is supplied directly.
The only case we need to find out about the mapping is when we
actively generate a register access, which only occurs when injecting
a fault in a guest.
Cc: stable@vger.kernel.org
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
handling.
PPC: Mostly bug fixes.
ARM: No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite for
IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86: quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new component (in
virt/lib/) that connects VFIO and KVM together. The same infrastructure
will be used for ARM interrupt forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic interrupt
controller will have to wait for 4.5. These will let KVM expose Hyper-V
devices.
- nested virtualization now supports VPID (same as PCID but for vCPUs)
which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for clflushopt,
clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel + IOAPIC/PIC/PIT in
userspace, which reduces the attack surface of the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten to not
require help from the hypervisor.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJWO2IQAAoJEL/70l94x66D/K0H/3AovAgYmJQToZlimsktMk6a
f2xhdIqfU5lIQQh5uNBCfL3o9o8H9Py1ym7aEw3fmztPHHJYc91oTatt2UEKhmEw
VtZHp/dFHt3hwaIdXmjRPEXiYctraKCyrhaUYdWmUYkoKi7lW5OL5h+S7frG2U6u
p/hFKnHRZfXHr6NSgIqvYkKqtnc+C0FWY696IZMzgCksOO8jB1xrxoSN3tANW3oJ
PDV+4og0fN/Fr1capJUFEc/fejREHneANvlKrLaa8ht0qJQutoczNADUiSFLcMPG
iHljXeDsv5eyjMtUuIL8+MPzcrIt/y4rY41ZPiKggxULrXc6H+JJL/e/zThZpXc=
=iv2z
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"First batch of KVM changes for 4.4.
s390:
A bunch of fixes and optimizations for interrupt and time handling.
PPC:
Mostly bug fixes.
ARM:
No big features, but many small fixes and prerequisites including:
- a number of fixes for the arch-timer
- introducing proper level-triggered semantics for the arch-timers
- a series of patches to synchronously halt a guest (prerequisite
for IRQ forwarding)
- some tracepoint improvements
- a tweak for the EL2 panic handlers
- some more VGIC cleanups getting rid of redundant state
x86:
Quite a few changes:
- support for VT-d posted interrupts (i.e. PCI devices can inject
interrupts directly into vCPUs). This introduces a new
component (in virt/lib/) that connects VFIO and KVM together.
The same infrastructure will be used for ARM interrupt
forwarding as well.
- more Hyper-V features, though the main one Hyper-V synthetic
interrupt controller will have to wait for 4.5. These will let
KVM expose Hyper-V devices.
- nested virtualization now supports VPID (same as PCID but for
vCPUs) which makes it quite a bit faster
- for future hardware that supports NVDIMM, there is support for
clflushopt, clwb, pcommit
- support for "split irqchip", i.e. LAPIC in kernel +
IOAPIC/PIC/PIT in userspace, which reduces the attack surface of
the hypervisor
- obligatory smattering of SMM fixes
- on the guest side, stable scheduler clock support was rewritten
to not require help from the hypervisor"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (123 commits)
KVM: VMX: Fix commit which broke PML
KVM: x86: obey KVM_X86_QUIRK_CD_NW_CLEARED in kvm_set_cr0()
KVM: x86: allow RSM from 64-bit mode
KVM: VMX: fix SMEP and SMAP without EPT
KVM: x86: move kvm_set_irq_inatomic to legacy device assignment
KVM: device assignment: remove pointless #ifdefs
KVM: x86: merge kvm_arch_set_irq with kvm_set_msi_inatomic
KVM: x86: zero apic_arb_prio on reset
drivers/hv: share Hyper-V SynIC constants with userspace
KVM: x86: handle SMBASE as physical address in RSM
KVM: x86: add read_phys to x86_emulate_ops
KVM: x86: removing unused variable
KVM: don't pointlessly leave KVM_COMPAT=y in non-KVM configs
KVM: arm/arm64: Merge vgic_set_lr() and vgic_sync_lr_elrsr()
KVM: arm/arm64: Clean up vgic_retire_lr() and surroundings
KVM: arm/arm64: Optimize away redundant LR tracking
KVM: s390: use simple switch statement as multiplexer
KVM: s390: drop useless newline in debugging data
KVM: s390: SCA must not cross page boundaries
KVM: arm: Do not indent the arguments of DECLARE_BITMAP
...
- "genirq: Introduce generic irq migration for cpu hotunplugged" patch
merged from tip/irq/for-arm to allow the arm64-specific part to be
upstreamed via the arm64 tree
- CPU feature detection reworked to cope with heterogeneous systems
where CPUs may not have exactly the same features. The features
reported by the kernel via internal data structures or ELF_HWCAP are
delayed until all the CPUs are up (and before user space starts)
- Support for 16KB pages, with the additional bonus of a 36-bit VA
space, though the latter only depending on EXPERT
- Implement native {relaxed, acquire, release} atomics for arm64
- New ASID allocation algorithm which avoids IPI on roll-over, together
with TLB invalidation optimisations (using local vs global where
feasible)
- KASan support for arm64
- EFI_STUB clean-up and isolation for the kernel proper (required by
KASan)
- copy_{to,from,in}_user optimisations (sharing the memcpy template)
- perf: moving arm64 to the arm32/64 shared PMU framework
- L1_CACHE_BYTES increased to 128 to accommodate Cavium hardware
- Support for the contiguous PTE hint on kernel mapping (16 consecutive
entries may be able to use a single TLB entry)
- Generic CONFIG_HZ now used on arm64
- defconfig updates
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJWOkmIAAoJEGvWsS0AyF7x4GgQAINU3NePjFFvWZNCkqobeH9+
jFKwtXamIudhTSdnXNXyYWmtRL9Krg3qI4zDQf68dvDFAZAze2kVuOi1yPpCbpFZ
/j/afNyQc7+PoyqRAzmT+EMPZlcuOA84Prrl1r3QWZ58QaFeVk/6ZxrHunTHxN0x
mR9PIXfWx73MTo+UnG8FChkmEY6LmV4XpemgTaMR9FqFhdT51OZSxDDAYXOTm4JW
a5HdN9OWjjJ2rhLlFEaC7tszG9B5doHdy2tr5ge/YERVJzIPDogHkMe8ZhfAJc+x
SQU5tKN6Pg4MOi+dLhxlk0/mKCvHLiEQ5KVREJnt8GxupAR54Bat+DQ+rP9cSnpq
dRQTcARIOyy9LGgy+ROAsSo+NiyM5WuJ0/WJUYKmgWTJOfczRYoZv6TMKlwNOUYb
tGLCZHhKPM3yBHJlWbQykl3xmSuudxCMmjlZzg7B+MVfTP6uo0CRSPmYl+v67q+J
bBw/Z2RYXWYGnvlc6OfbMeImI6prXeE36+5ytyJFga0m+IqcTzRGzjcLxKEvdbiU
pr8n9i+hV9iSsT/UwukXZ8ay6zH7PrTLzILWQlieutfXlvha7MYeGxnkbLmdYcfe
GCj374io5cdImHcVKmfhnOMlFOLuOHphl9cmsd/O2LmCIqBj9BIeNH2Om8mHVK2F
YHczMdpESlJApE7kUc1e
=3six
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- "genirq: Introduce generic irq migration for cpu hotunplugged" patch
merged from tip/irq/for-arm to allow the arm64-specific part to be
upstreamed via the arm64 tree
- CPU feature detection reworked to cope with heterogeneous systems
where CPUs may not have exactly the same features. The features
reported by the kernel via internal data structures or ELF_HWCAP are
delayed until all the CPUs are up (and before user space starts)
- Support for 16KB pages, with the additional bonus of a 36-bit VA
space, though the latter only depending on EXPERT
- Implement native {relaxed, acquire, release} atomics for arm64
- New ASID allocation algorithm which avoids IPI on roll-over, together
with TLB invalidation optimisations (using local vs global where
feasible)
- KASan support for arm64
- EFI_STUB clean-up and isolation for the kernel proper (required by
KASan)
- copy_{to,from,in}_user optimisations (sharing the memcpy template)
- perf: moving arm64 to the arm32/64 shared PMU framework
- L1_CACHE_BYTES increased to 128 to accommodate Cavium hardware
- Support for the contiguous PTE hint on kernel mapping (16 consecutive
entries may be able to use a single TLB entry)
- Generic CONFIG_HZ now used on arm64
- defconfig updates
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (91 commits)
arm64/efi: fix libstub build under CONFIG_MODVERSIONS
ARM64: Enable multi-core scheduler support by default
arm64/efi: move arm64 specific stub C code to libstub
arm64: page-align sections for DEBUG_RODATA
arm64: Fix build with CONFIG_ZONE_DMA=n
arm64: Fix compat register mappings
arm64: Increase the max granular size
arm64: remove bogus TASK_SIZE_64 check
arm64: make Timer Interrupt Frequency selectable
arm64/mm: use PAGE_ALIGNED instead of IS_ALIGNED
arm64: cachetype: fix definitions of ICACHEF_* flags
arm64: cpufeature: declare enable_cpu_capabilities as static
genirq: Make the cpuhotplug migration code less noisy
arm64: Constify hwcap name string arrays
arm64/kvm: Make use of the system wide safe values
arm64/debug: Make use of the system wide safe value
arm64: Move FP/ASIMD hwcap handling to common code
arm64/HWCAP: Use system wide safe values
arm64/capabilities: Make use of system wide safe value
arm64: Delay cpu feature capability checks
...
If we panic in hyp mode, we inject a call to panic() into the EL1N host
kernel. If a guest context is active, we first attempt to restore the
minimal amount of state necessary to execute the host kernel with
restore_sysregs.
However, the SP is restored as part of restore_common_regs, and so we
may return to the host's panic() function with the SP of the guest. Any
calculations based on the SP will be bogus, and any attempt to access
the stack will result in recursive data aborts.
When running Linux as a guest, the guest's EL1N SP is like to be some
valid kernel address. In this case, the host kernel may use that region
as a stack for panic(), corrupting it in the process.
Avoid the problem by restoring the host SP prior to returning to the
host. To prevent misleading backtraces in the host, the FP is zeroed at
the same time. We don't need any of the other "common" registers in
order to panic successfully.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: <kvmarm@lists.cs.columbia.edu>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
vhost drivers provide guest VMs with better I/O performance and lower
CPU utilization. This patch allows users to select vhost devices under
KVM configuration menu on ARM. This makes vhost support on arm/arm64
on a par with other architectures (e.g. x86, ppc).
Signed-off-by: Wei Huang <wei@redhat.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Use the system wide safe value from the new API for safer
decisions
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: kvmarm@lists.cs.columbia.edu
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Tested-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
This patch turns on the 16K page support in the kernel. We
support 48bit VA (4 level page tables) and 47bit VA (3 level
page tables).
With 16K we can map 128 entries using contiguous bit hint
at level 3 to map 2M using single TLB entry.
TODO: 16K supports 32 contiguous entries at level 2 to get us
1G(which is not yet supported by the infrastructure). That should
be a separate patch altogether.
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Hardware virtualisation of GICv3 is only supported by 64bit hosts for
the moment. Some VGICv3 bits are missing from the 32bit side, and this
patch allows to still be able to build 32bit hosts when CONFIG_ARM_GIC_V3
is selected.
To this end, we introduce a new option, CONFIG_KVM_ARM_VGIC_V3, that is
only enabled on the 64bit side. The selection is done unconditionally
because CONFIG_ARM_GIC_V3 is always enabled on arm64.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe.brucker@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
- Workaround for a Cortex-A57 erratum
- Bug fix for the debugging infrastructure
- Fix for 32bit guests with more than 4GB of address space
on a 32bit host
- A number of fixes for the (unusual) case when we don't use
the in-kernel GIC emulation
- Removal of ThumbEE handling on arm64, since these have been
dropped from the architecture before anyone actually ever
built a CPU
- Remove the KVM_ARM_MAX_VCPUS limitation which has become
fairly pointless
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV+slQAAoJECPQ0LrRPXpD+WUQAMLC3ZUasJX1gsVixd++zAwB
FXu0TFlKCUsLWllXZtyhGI6ya7ljuCzfhRbA/eZFmFVbwDnULt1p5ahw7eHCIZ2a
yY93TS6XN3YHwVpY7f2lDsvLhBLyWeTdWhj5TtLy6mslQyEUqxdmsiC7gl40Fp2S
8tKIxoYYRpmbgKl/Lbi8GxdHH6c0aQ2Nt7Fq4nV9dJqy5tiGdg6OxqgU/rVmkdkv
Rv1jrdtncstNRi9NBbKRRDp5DTqWboF35HJQpdIRpR8jJTLuuzzCimP5Hz9crKuO
uXchIq2GtQB60NklZtPL15zMdmfdq+JHwdC14v05kB5Ai8NThGwKYQ3JF+krO3cG
RKsAlrIq0AwPN8hAboLcKGzjLFFryaHZsa+d7elxaaDQz1FGz4uP56fIUURoGZuX
vWTsKLRKcuPCYtnV6Frg2BCTB6nq1cRgjmMC9TABnraelZ3z0lDl4wFngg4aL2u6
QYOdP8L++/S1HAPOF7VhFYndXkbM3KoVLAepev8jvzRnwg4QVrqsvfgwFSdMNcMz
ga7bJ4pUEP+Qq1i0qc41P9O708bCGm7TIw3CzTdKIZhc/l0t137lw1rhv67JfXZh
cAni4osjhpdZUT0F9lIl/6OQB3Kgk6on3cs909Y/tT1srh9s+iVO1AwpGY1j5T4j
gFRy90o2LBuepoI/8yF3
=NNtz
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.3-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
Second set of KVM/ARM changes for 4.3-rc2
- Workaround for a Cortex-A57 erratum
- Bug fix for the debugging infrastructure
- Fix for 32bit guests with more than 4GB of address space
on a 32bit host
- A number of fixes for the (unusual) case when we don't use
the in-kernel GIC emulation
- Removal of ThumbEE handling on arm64, since these have been
dropped from the architecture before anyone actually ever
built a CPU
- Remove the KVM_ARM_MAX_VCPUS limitation which has become
fairly pointless
This patch removes config option of KVM_ARM_MAX_VCPUS,
and like other ARCHs, just choose the maximum allowed
value from hardware, and follows the reasons:
1) from distribution view, the option has to be
defined as the max allowed value because it need to
meet all kinds of virtulization applications and
need to support most of SoCs;
2) using a bigger value doesn't introduce extra memory
consumption, and the help text in Kconfig isn't accurate
because kvm_vpu structure isn't allocated until request
of creating VCPU is sent from QEMU;
3) the main effect is that the field of vcpus[] in 'struct kvm'
becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
lines to hold the structure, but 'struct kvm' is one generic struct,
and it has worked well on other ARCHs already in this way. Also,
the world switch frequecy is often low, for example, it is ~2000
when running kernel building load in VM from APM xgene KVM host,
so the effect is very small, and the difference can't be observed
in my test at all.
Cc: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Although the ThumbEE registers and traps were present in earlier
versions of the v8 architecture, it was retrospectively removed and so
we can do the same.
Whilst this breaks migrating a guest started on a previous version of
the kernel, it is much better to kill these (non existent) registers
as soon as possible.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[maz: added commend about migration]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When running a guest with the architected timer disabled (with QEMU and
the kernel_irqchip=off option, for example), it is important to make
sure the timer gets turned off. Otherwise, the guest may try to
enable it anyway, leading to a screaming HW interrupt.
The fix is to unconditionally turn off the virtual timer on guest
exit.
Cc: stable@vger.kernel.org
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When setting the debug register from userspace, make sure that
copy_from_user() is called with its parameters in the expected
order. It otherwise doesn't do what you think.
Fixes: 84e690bfbe ("KVM: arm64: introduce vcpu->arch.debug_ptr")
Reported-by: Peter Maydell <peter.maydell@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
When restoring the system register state for an AArch32 guest at EL2,
writes to DACR32_EL2 may not be correctly synchronised by Cortex-A57,
which can lead to the guest effectively running with junk in the DACR
and running into unexpected domain faults.
This patch works around the issue by re-ordering our restoration of the
AArch32 register aliases so that they happen before the AArch64 system
registers. Ensuring that the registers are restored in this order
guarantees that they will be correctly synchronised by the core.
Cc: <stable@vger.kernel.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
- Support for new architectural features introduced in ARMv8.1:
* Privileged Access Never (PAN) to catch user pointer dereferences in
the kernel
* Large System Extension (LSE) for building scalable atomics and locks
(depends on locking/arch-atomic from tip, which is included here)
* Hardware Dirty Bit Management (DBM) for updating clean PTEs
automatically
- Move our PSCI implementation out into drivers/firmware/, where it can
be shared with arch/arm/. RMK has also pulled this component branch
and has additional patches moving arch/arm/ over. MAINTAINERS is
updated accordingly.
- Better BUG implementation based on the BRK instruction for trapping
- Leaf TLB invalidation for unmapping user pages
- Support for PROBE_ONLY PCI configurations
- Various cleanups and non-critical fixes, including:
* Always flush FP/SIMD state over exec()
* Restrict memblock additions based on range of linear mapping
* Ensure *(LIST_POISON) generates a fatal fault
* Context-tracking syscall return no longer corrupts return value when
not forced on.
* Alternatives patching synchronisation/stability improvements
* Signed sub-word cmpxchg compare fix (tickled by HAVE_CMPXCHG_LOCAL)
* Force SMP=y
* Hide direct DCC access from userspace
* Fix EFI stub memory allocation when DRAM starts at 0x0
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJV5XXWAAoJELescNyEwWM0p4UIAIQwgoUnj01LvtImjMyG0NiY
38GbAia7FsyIktSjuCaEhLsWjL8WSMscRsz6MLK01ir3iOoKdtXd/OptlsJTV5c5
5POPAU6hvdfKj6MtsaOAOx4dz7bhM/HB9JSZmcbHqytOxIi4Tp1JoBrmM1mpNwmp
VFy+GAOs5H6Lb/xUMm50pVUx+mjMXsH4Bo1c/0Y/gYsjhcvcRgE2iqnl7UExgDcW
5sbhpsdw8zleDx+kzTmt5QoFWk/4l3d/F+0dzLCYfxzCLNYacksbQqEbGFVAsiIl
aACK3Uqk7v7ZtFqqQLtNzE6Pfiw0CzajINPUyykoMCnDtMsyhYbxqezywCAPpSY=
=8qHf
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
- Support for new architectural features introduced in ARMv8.1:
* Privileged Access Never (PAN) to catch user pointer dereferences in
the kernel
* Large System Extension (LSE) for building scalable atomics and locks
(depends on locking/arch-atomic from tip, which is included here)
* Hardware Dirty Bit Management (DBM) for updating clean PTEs
automatically
- Move our PSCI implementation out into drivers/firmware/, where it can
be shared with arch/arm/. RMK has also pulled this component branch
and has additional patches moving arch/arm/ over. MAINTAINERS is
updated accordingly.
- Better BUG implementation based on the BRK instruction for trapping
- Leaf TLB invalidation for unmapping user pages
- Support for PROBE_ONLY PCI configurations
- Various cleanups and non-critical fixes, including:
* Always flush FP/SIMD state over exec()
* Restrict memblock additions based on range of linear mapping
* Ensure *(LIST_POISON) generates a fatal fault
* Context-tracking syscall return no longer corrupts return value when
not forced on.
* Alternatives patching synchronisation/stability improvements
* Signed sub-word cmpxchg compare fix (tickled by HAVE_CMPXCHG_LOCAL)
* Force SMP=y
* Hide direct DCC access from userspace
* Fix EFI stub memory allocation when DRAM starts at 0x0
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (92 commits)
arm64: flush FP/SIMD state correctly after execve()
arm64: makefile: fix perf_callchain.o kconfig dependency
arm64: set MAX_MEMBLOCK_ADDR according to linear region size
of/fdt: make memblock maximum physical address arch configurable
arm64: Fix source code file path in comments
arm64: entry: always restore x0 from the stack on syscall return
arm64: mdscr_el1: avoid exposing DCC to userspace
arm64: kconfig: Move LIST_POISON to a safe value
arm64: Add __exception_irq_entry definition for function graph
arm64: mm: ensure patched kernel text is fetched from PoU
arm64: alternatives: ensure secondary CPUs execute ISB after patching
arm64: make ll/sc __cmpxchg_case_##name asm consistent
arm64: dma-mapping: Simplify pgprot handling
arm64: restore cpu suspend/resume functionality
ARM64: PCI: do not enable resources on PROBE_ONLY systems
arm64: cmpxchg: truncate sub-word signed types before comparison
arm64: alternative: put secondary CPUs into polling loop during patch
arm64/Documentation: clarify wording regarding memory below the Image
arm64: lse: fix lse cmpxchg code indentation
arm64: remove redundant object file list
...
When injecting a fault into a misbehaving 32bit guest, it seems
rather idiotic to also inject a 64bit fault that is only going
to corrupt the guest state. This leads to a situation where we
perform an illegal exception return at EL2 causing the host
to crash instead of killing the guest.
Just fix the stupid bug that has been there from day 1.
Cc: <stable@vger.kernel.org>
Reported-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch only saves and restores FP/SIMD registers on Guest access. To do
this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit.
lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD
context is not saved/restored
[chazy/maz: fixed save/restore logic for 32bit guests]
Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In order to remove the crude hack where we sneak the masked bit
into the timer's control register, make use of the phys_irq_map
API control the active state of the interrupt.
This causes some limited changes to allow for potential error
propagation.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch adds a generic ARM v8 KVM target cpu type for use
by the new CPUs which eventualy ends up using the common sys_reg
table. For backward compatibility the existing targets have been
preserved. Any new target CPU that can be covered by generic v8
sys_reg tables should make use of the new generic target.
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Acked-by: Marc Zyngier <Marc.Zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Convert the dynamic patching for ARM64_HAS_SYSREG_GIC_CPUIF over to
the newly added alternative assembler macros.
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This includes trace points for:
kvm_arch_setup_guest_debug
kvm_arch_clear_guest_debug
I've also added some generic register setting trace events and also a
trace point to dump the array of hardware registers.
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Finally advertise the KVM capability for SET_GUEST_DEBUG. Once arm
support is added this check can be moved to the common
kvm_vm_ioctl_check_extension() code.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This introduces a level of indirection for the debug registers. Instead
of using the sys_regs[] directly we store registers in a structure in
the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the
guest context.
Because we no longer give the sys_regs offset for the sys_reg_desc->reg
field, but instead the index into a debug-specific struct we need to
add a number of additional trap functions for each register. Also as the
generic generic user-space access code no longer works we have
introduced a new pair of function pointers to the sys_reg_desc structure
to override the generic code when needed.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This is a pre-cursor to sharing the code with the guest debug support.
This replaces the big macro that fishes data out of a fixed location
with a more general helper macro to restore a set of debug registers. It
uses macro substitution so it can be re-used for debug control and value
registers. It does however rely on the debug registers being 64 bit
aligned (as they happen to be in the hyp ABI).
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This adds support for single-stepping the guest. To do this we need to
manipulate the guests PSTATE.SS and MDSCR_EL1.SS bits to trigger
stepping. We take care to preserve MDSCR_EL1 and trap access to it to
ensure we don't affect the apparent state of the guest.
As we have to enable trapping of all software debug exceptions we
suppress the ability of the guest to single-step itself. If we didn't we
would have to deal with the exception arriving while the guest was in
kernelspace when the guest is expecting to single-step userspace. This
is something we don't want to unwind in the kernel. Once the host is no
longer debugging the guest its ability to single-step userspace is
restored.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This adds support for SW breakpoints inserted by userspace.
We do this by trapping all guest software debug exceptions to the
hypervisor (MDCR_EL2.TDE). The exit handler sets an exit reason of
KVM_EXIT_DEBUG with the kvm_debug_exit_arch structure holding the
exception syndrome information.
It will be up to userspace to extract the PC (via GET_ONE_REG) and
determine if the debug event was for a breakpoint it inserted. If not
userspace will need to re-inject the correct exception restart the
hypervisor to deliver the debug exception to the guest.
Any other guest software debug exception (e.g. single step or HW
assisted breakpoints) will cause an error and the VM to be killed. This
is addressed by later patches which add support for the other debug
types.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This is a precursor for later patches which will need to do more to
setup debug state before entering the hyp.S switch code. The existing
functionality for setting mdcr_el2 has been moved out of hyp.S and now
uses the value kept in vcpu->arch.mdcr_el2.
As the assembler used to previously mask and preserve MDCR_EL2.HPMN I've
had to add a mechanism to save the value of mdcr_el2 as a per-cpu
variable during the initialisation code. The kernel never sets this
number so we are assuming the bootcode has set up the correct value
here.
This also moves the conditional setting of the TDA bit from the hyp code
into the C code which is currently used for the lazy debug register
context switch code.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This commit adds a stub function to support the KVM_SET_GUEST_DEBUG
ioctl. Any unsupported flag will return -EINVAL. For now, only
KVM_GUESTDBG_ENABLE is supported, although it won't have any effects.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>.
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
- CPU ops and PSCI (Power State Coordination Interface) refactoring
following the merging of the arm64 ACPI support, together with
handling of Trusted (secure) OS instances
- Using fixmap for permanent FDT mapping, removing the initial dtb
placement requirements (within 512MB from the start of the kernel
image). This required moving the FDT self reservation out of the
memreserve processing
- Idmap (1:1 mapping used for MMU on/off) handling clean-up
- Removing flush_cache_all() - not safe on ARM unless the MMU is off.
Last stages of CPU power down/up are handled by firmware already
- "Alternatives" (run-time code patching) refactoring and support for
immediate branch patching, GICv3 CPU interface access
- User faults handling clean-up
And some fixes:
- Fix for VDSO building with broken ELF toolchains
- Fixing another case of init_mm.pgd usage for user mappings (during
ASID roll-over broadcasting)
- Fix for FPSIMD reloading after CPU hotplug
- Fix for missing syscall trace exit
- Workaround for .inst asm bug
- Compat fix for switching the user tls tpidr_el0 register
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJVitZgAAoJEGvWsS0AyF7x+ToP/0Yci5bNsYVwVay8N1rK6WHh
SGzDMzyxcSBjQpz2IhhTJ28eTAEH+a+HWQms+0tBehjqxqkvjuzBN0okDkc/z8NB
7Z0BV2aLkQcMwTbjgIh5jm25ZpGmvmvbWPD5oBwgmgQ4v4i1OLRKgx7+YQ+z9rWZ
zC70d0UwyWjs2oxmjd2ZrAkps7v/qozEFhcRHxLzCn8Mbw+3FcTQsqMbfnoWGnH0
YuGfHQQqBY4/HC7uAslMCy7tXeJXqb+NsgrnAovjfEbHGDjsg0KNl06K++LHwE37
A5Noa3M0AQEPYqx/sg0Ec8RNUUEMB4RA2DCaibp7XlVGncXOwFfiyk/M5uVrYXIO
ku5QF0ytUfZKzrMq3yQKbEDuCPOFTqjjdVpkeXKFdW66zYTohKVc3vUBV5xHZ5uO
7Kr8H0ZnhAv3OxPyKdEwAuQ5sJdWwQSvZyGClxMUO4eC/UzD0Mwwf1Y8WYtiAXx+
NSTeBKw/m33W3/KhNuNH1+qGEOKhuXuKX7AcYA84Rab8ytxYWcurHCG2bmhzpEse
2DZtNMybrP/HMQPyJlYgGac8B3QbsAIAkkU1f+dJTAv9otuBDhscaDQyZ9Y6WVht
/k8zJiZeMEuGAmwgTkzLmWs/8pQq42nW4J4eQdXPZAwp4ghCIypPWfaZASAwee6/
p+es3v8P4k9wkv2TFZMh
=YeGl
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"Mostly refactoring/clean-up:
- CPU ops and PSCI (Power State Coordination Interface) refactoring
following the merging of the arm64 ACPI support, together with
handling of Trusted (secure) OS instances
- Using fixmap for permanent FDT mapping, removing the initial dtb
placement requirements (within 512MB from the start of the kernel
image). This required moving the FDT self reservation out of the
memreserve processing
- Idmap (1:1 mapping used for MMU on/off) handling clean-up
- Removing flush_cache_all() - not safe on ARM unless the MMU is off.
Last stages of CPU power down/up are handled by firmware already
- "Alternatives" (run-time code patching) refactoring and support for
immediate branch patching, GICv3 CPU interface access
- User faults handling clean-up
And some fixes:
- Fix for VDSO building with broken ELF toolchains
- Fix another case of init_mm.pgd usage for user mappings (during
ASID roll-over broadcasting)
- Fix for FPSIMD reloading after CPU hotplug
- Fix for missing syscall trace exit
- Workaround for .inst asm bug
- Compat fix for switching the user tls tpidr_el0 register"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
arm64: use private ratelimit state along with show_unhandled_signals
arm64: show unhandled SP/PC alignment faults
arm64: vdso: work-around broken ELF toolchains in Makefile
arm64: kernel: rename __cpu_suspend to keep it aligned with arm
arm64: compat: print compat_sp instead of sp
arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
arm64: entry: fix context tracking for el0_sp_pc
arm64: defconfig: enable memtest
arm64: mm: remove reference to tlb.S from comment block
arm64: Do not attempt to use init_mm in reset_context()
arm64: KVM: Switch vgic save/restore to alternative_insn
arm64: alternative: Introduce feature for GICv3 CPU interface
arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning
arm64: fix bug for reloading FPSIMD state after CPU hotplug.
arm64: kernel thread don't need to save fpsimd context.
arm64: fix missing syscall trace exit
arm64: alternative: Work around .inst assembler bugs
arm64: alternative: Merge alternative-asm.h into alternative.h
arm64: alternative: Allow immediate branch as alternative instruction
arm64: Rework alternate sequence for ARM erratum 845719
...
The GIC Hypervisor Configuration Register is used to enable
the delivery of virtual interupts to a guest, as well as to
define in which conditions maintenance interrupts are delivered
to the host.
This register doesn't contain any information that we need to
read back (the EOIcount is utterly useless for us).
So let's save ourselves some cycles, and not save it before
writing zero to it.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The elr_el2 and spsr_el2 registers in fact contain the processor state
before entry into EL2. In the case of guest state it could be in either
el0 or el1.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
The KVM-VFIO device is used by the QEMU VFIO device. It is used to
record the list of in-use VFIO groups so that KVM can manipulate
them.
Signed-off-by: Kim Phillips <kim.phillips@linaro.org>
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
So far, we configured the world-switch by having a small array
of pointers to the save and restore functions, depending on the
GIC used on the platform.
Loading these values each time is a bit silly (they never change),
and it makes sense to rely on the instruction patching instead.
This leads to a nice cleanup of the code.
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The main change here is a significant head.S rework that allows us to
boot on machines with physical memory at a really high address without
having to increase our mapped VA range. Other changes include:
- AES performance boost for Cortex-A57
- AArch32 (compat) userspace with 64k pages
- Cortex-A53 erratum workaround for #845719
- defconfig updates (new platforms, PCI, ...)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJVLnQpAAoJELescNyEwWM03RIH/iwcDc0MBZgkwfD5cnY+29p4
m89lMDo3SyGQT4NynHSw7P3R7c3zULmI+9hmJMw/yfjjjL6m7X+vVAF3xj1Am4Al
OzCqYLHyFnlRktzJ6dWeF1Ese7tWqPpxn+OCXgYNpz/r5MfF/HhlyX/qNzAQPKrw
ZpDvnt44DgUfweqjTbwQUg2wkyCRjmz57MQYxDcmJStdpHIu24jWOvDIo3OJGjyS
L49I9DU6DGUhkISZmmBE0T7vmKMD1BcgI7OIzX2WIqn521QT+GSLMhRxaHmK1s1V
A8gaMTwpo0xFhTAt7sbw/5+2663WmfRdZI+FtduvORsoxX6KdDn7DH1NQixIm8s=
=+F0I
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Here are the core arm64 updates for 4.1.
Highlights include a significant rework to head.S (allowing us to boot
on machines with physical memory at a really high address), an AES
performance boost on Cortex-A57 and the ability to run a 32-bit
userspace with 64k pages (although this requires said userspace to be
built with a recent binutils).
The head.S rework spilt over into KVM, so there are some changes under
arch/arm/ which have been acked by Marc Zyngier (KVM co-maintainer).
In particular, the linker script changes caused us some issues in
-next, so there are a few merge commits where we had to apply fixes on
top of a stable branch.
Other changes include:
- AES performance boost for Cortex-A57
- AArch32 (compat) userspace with 64k pages
- Cortex-A53 erratum workaround for #845719
- defconfig updates (new platforms, PCI, ...)"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (39 commits)
arm64: fix midr range for Cortex-A57 erratum 832075
arm64: errata: add workaround for cortex-a53 erratum #845719
arm64: Use bool function return values of true/false not 1/0
arm64: defconfig: updates for 4.1
arm64: Extract feature parsing code from cpu_errata.c
arm64: alternative: Allow immediate branch as alternative instruction
arm64: insn: Add aarch64_insn_decode_immediate
ARM: kvm: round HYP section to page size instead of log2 upper bound
ARM: kvm: assert on HYP section boundaries not actual code size
arm64: head.S: ensure idmap_t0sz is visible
arm64: pmu: add support for interrupt-affinity property
dt: pmu: extend ARM PMU binding to allow for explicit interrupt affinity
arm64: head.S: ensure visibility of page tables
arm64: KVM: use ID map with increased VA range if required
arm64: mm: increase VA range of identity map
ARM: kvm: implement replacement for ld's LOG2CEIL()
arm64: proc: remove unused cpu_get_pgd macro
arm64: enforce x1|x2|x3 == 0 upon kernel entry as per boot protocol
arm64: remove __calc_phys_offset
arm64: merge __enable_mmu and __turn_mmu_on
...
virt/kvm was never really a good include directory for anything else
than locally included headers.
With the move of iodev.h there is no need anymore to add this
directory the compiler's include path, so remove it from the arm and
arm64 kvm Makefile.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
This patch modifies the HYP init code so it can deal with system
RAM residing at an offset which exceeds the reach of VA_BITS.
Like for EL1, this involves configuring an additional level of
translation for the ID map. However, in case of EL2, this implies
that all translations use the extra level, as we cannot seamlessly
switch between translation tables with different numbers of
translation levels.
So add an extra translation table at the root level. Since the
ID map and the runtime HYP map are guaranteed not to overlap, they
can share this root level, and we can essentially merge these two
tables into one.
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch enables irqfd on arm/arm64.
Both irqfd and resamplefd are supported. Injection is implemented
in vgic.c without routing.
This patch enables CONFIG_HAVE_KVM_EVENTFD and CONFIG_HAVE_KVM_IRQFD.
KVM_CAP_IRQFD is now advertised. KVM_CAP_IRQFD_RESAMPLE capability
automatically is advertised as soon as CONFIG_HAVE_KVM_IRQFD is set.
Irqfd injection is restricted to SPI. The rationale behind not
supporting PPI irqfd injection is that any device using a PPI would
be a private-to-the-CPU device (timer for instance), so its state
would have to be context-switched along with the VCPU and would
require in-kernel wiring anyhow. It is not a relevant use case for
irqfds.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
CONFIG_HAVE_KVM_IRQCHIP is needed to support IRQ routing (along
with irq_comm.c and irqchip.c usage). This is not the case for
arm/arm64 currently.
This patch unsets the flag for both arm and arm64.
Signed-off-by: Eric Auger <eric.auger@linaro.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
We can definitely decide at run-time whether to use the GIC and timers
or not, and the extra code and data structures that we allocate space
for is really negligable with this config option, so I don't think it's
worth the extra complexity of always having to define stub static
inlines. The !CONFIG_KVM_ARM_VGIC/TIMER case is pretty much an untested
code path anyway, so we're better off just getting rid of it.
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Common: Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other architectures).
This can improve latency up to 50% on some scenarios (e.g. O_DSYNC writes
or TCP_RR netperf tests). This also has to be enabled manually for now,
but the plan is to auto-tune this in the future.
ARM/ARM64: the highlights are support for GICv3 emulation and dirty page
tracking
s390: several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS: Bugfixes.
x86: Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested virtualization
improvements (nested APICv---a nice optimization), usual round of emulation
fixes. There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
ARM has other conflicts where functions are added in the same place
by 3.19-rc and 3.20 patches. These are not large though, and entirely
within KVM.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJU28rkAAoJEL/70l94x66DXqQH/1TDOfJIjW7P2kb0Sw7Fy1wi
cEX1KO/VFxAqc8R0E/0Wb55CXyPjQJM6xBXuFr5cUDaIjQ8ULSktL4pEwXyyv/s5
DBDkN65mriry2w5VuEaRLVcuX9Wy+tqLQXWNkEySfyb4uhZChWWHvKEcgw5SqCyg
NlpeHurYESIoNyov3jWqvBjr4OmaQENyv7t2c6q5ErIgG02V+iCux5QGbphM2IC9
LFtPKxoqhfeB2xFxTOIt8HJiXrZNwflsTejIlCl/NSEiDVLLxxHCxK2tWK/tUXMn
JfLD9ytXBWtNMwInvtFm4fPmDouv2VDyR0xnK2db+/axsJZnbxqjGu1um4Dqbak=
=7gdx
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM update from Paolo Bonzini:
"Fairly small update, but there are some interesting new features.
Common:
Optional support for adding a small amount of polling on each HLT
instruction executed in the guest (or equivalent for other
architectures). This can improve latency up to 50% on some
scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
also has to be enabled manually for now, but the plan is to
auto-tune this in the future.
ARM/ARM64:
The highlights are support for GICv3 emulation and dirty page
tracking
s390:
Several optimizations and bugfixes. Also a first: a feature
exposed by KVM (UUID and long guest name in /proc/sysinfo) before
it is available in IBM's hypervisor! :)
MIPS:
Bugfixes.
x86:
Support for PML (page modification logging, a new feature in
Broadwell Xeons that speeds up dirty page tracking), nested
virtualization improvements (nested APICv---a nice optimization),
usual round of emulation fixes.
There is also a new option to reduce latency of the TSC deadline
timer in the guest; this needs to be tuned manually.
Some commits are common between this pull and Catalin's; I see you
have already included his tree.
Powerpc:
Nothing yet.
The KVM/PPC changes will come in through the PPC maintainers,
because I haven't received them yet and I might end up being
offline for some part of next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
KVM: ia64: drop kvm.h from installed user headers
KVM: x86: fix build with !CONFIG_SMP
KVM: x86: emulate: correct page fault error code for NoWrite instructions
KVM: Disable compat ioctl for s390
KVM: s390: add cpu model support
KVM: s390: use facilities and cpu_id per KVM
KVM: s390/CPACF: Choose crypto control block format
s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
KVM: s390: reenable LPP facility
KVM: s390: floating irqs: fix user triggerable endless loop
kvm: add halt_poll_ns module parameter
kvm: remove KVM_MMIO_SIZE
KVM: MIPS: Don't leak FPU/DSP to guest
KVM: MIPS: Disable HTW while in guest
KVM: nVMX: Enable nested posted interrupt processing
KVM: nVMX: Enable nested virtual interrupt delivery
KVM: nVMX: Enable nested apic register virtualization
KVM: nVMX: Make nested control MSRs per-cpu
KVM: nVMX: Enable nested virtualize x2apic mode
KVM: nVMX: Prepare for using hardware MSR bitmap
...
- reimplementation of the virtual remapping of UEFI Runtime Services in
a way that is stable across kexec
- emulation of the "setend" instruction for 32-bit tasks (user
endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
accordingly)
- compat_sys_call_table implemented in C (from asm) and made it a
constant array together with sys_call_table
- export CPU cache information via /sys (like other architectures)
- DMA API implementation clean-up in preparation for IOMMU support
- macros clean-up for KVM
- dropped some unnecessary cache+tlb maintenance
- CONFIG_ARM64_CPU_SUSPEND clean-up
- defconfig update (CPU_IDLE)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJU25v3AAoJEGvWsS0AyF7xYjcP/j8ESvs+z0BPgeJ6XREfOnCh
cp+w/1rJ5BafJ5RRkibrciwTNOIJS4FGMivWyURtoh430lS0Rh7fxZ3Ouna3xjrT
Nf7AxenWoA8Lo6wHh+FlNUeGk3iWfX6WwA2tYrbKudK+LBJ1wHjwpE7cWQO0FgwJ
aFDahu+QD5/u45p/VcVctMtiEDvOxBdO8gfat6r+YkLm7pbRxQkZnpA/JE4Gps1p
Td5jvMNH9pXI5pffSbeR9Q+vs/r0yqKLXQg01Eb2bZgGDgwf9yzADrHuaKamZt35
X5flmLiTGC6swJCJvUkZC1Nuue33bXcvW5+vgvar+MNGyXsxv+B/wARLqGhiWhQZ
nLGwFpuNu6wdY9tGHb/XR8khcewkw1/lRH1hHKhchrmRyUqHvXcPgC5tamjLrY8C
BV3BAeQvRho8OKwWUmbXIlyON1vPux6CJdj4D/A5NL+qph2WHeVWJCXg6nVFx0Wc
Eb3bXbI4QRwTFL7pGRF8RyZJBAQtgYhQMKWMW2GHgUgn+r1EixG73BZoSwvpHrrw
FOR9AVNfVBqmNON8xiIb3DN4EViq76EF0jrsZh5I9EoWS2w5qtk60kJQgXE+M4EE
vOlmh3dhEVfCN2SxOn0bgoQmTulyjqGauTSSJKQbIBuinPFveukrJfGNFIWt0SZs
f38FBMo6sgU4VG85B+Fr
=X5x/
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"arm64 updates for 3.20:
- reimplementation of the virtual remapping of UEFI Runtime Services
in a way that is stable across kexec
- emulation of the "setend" instruction for 32-bit tasks (user
endianness switching trapped in the kernel, SCTLR_EL1.E0E bit set
accordingly)
- compat_sys_call_table implemented in C (from asm) and made it a
constant array together with sys_call_table
- export CPU cache information via /sys (like other architectures)
- DMA API implementation clean-up in preparation for IOMMU support
- macros clean-up for KVM
- dropped some unnecessary cache+tlb maintenance
- CONFIG_ARM64_CPU_SUSPEND clean-up
- defconfig update (CPU_IDLE)
The EFI changes going via the arm64 tree have been acked by Matt
Fleming. There is also a patch adding sys_*stat64 prototypes to
include/linux/syscalls.h, acked by Andrew Morton"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (47 commits)
arm64: compat: Remove incorrect comment in compat_siginfo
arm64: Fix section mismatch on alloc_init_p[mu]d()
arm64: Avoid breakage caused by .altmacro in fpsimd save/restore macros
arm64: mm: use *_sect to check for section maps
arm64: drop unnecessary cache+tlb maintenance
arm64:mm: free the useless initial page table
arm64: Enable CPU_IDLE in defconfig
arm64: kernel: remove ARM64_CPU_SUSPEND config option
arm64: make sys_call_table const
arm64: Remove asm/syscalls.h
arm64: Implement the compat_sys_call_table in C
syscalls: Declare sys_*stat64 prototypes if __ARCH_WANT_(COMPAT_)STAT64
compat: Declare compat_sys_sigpending and compat_sys_sigprocmask prototypes
arm64: uapi: expose our struct ucontext to the uapi headers
smp, ARM64: Kill SMP single function call interrupt
arm64: Emulate SETEND for AArch32 tasks
arm64: Consolidate hotplug notifier for instruction emulation
arm64: Track system support for mixed endian EL0
arm64: implement generic IOMMU configuration
arm64: Combine coherent and non-coherent swiotlb dma_ops
...
Pull RCU updates from Ingo Molnar:
"The main RCU changes in this cycle are:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates"
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
rcu: Initialize tiny RCU stall-warning timeouts at boot
rcu: Fix RCU CPU stall detection in tiny implementation
rcu: Add GP-kthread-starvation checks to CPU stall warnings
rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
rcu: Optionally run grace-period kthreads at real-time priority
ksoftirqd: Use new cond_resched_rcu_qs() function
ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
rcutorture: Add more diagnostics in rcu_barrier() test failure case
torture: Flag console.log file to prevent holdovers from earlier runs
torture: Add "-enable-kvm -soundhw pcspk" to qemu command line
rcutorture: Handle different mpstat versions
rcutorture: Check from beginning to end of grace period
rcu: Remove redundant rcu_batches_completed() declaration
rcutorture: Drop rcu_torture_completed() and friends
rcu: Provide rcu_batches_completed_sched() for TINY_RCU
rcutorture: Use unsigned for Reader Batch computations
rcutorture: Make build-output parsing correctly flag RCU's warnings
rcu: Make _batches_completed() functions return unsigned long
rcutorture: Issue warnings on close calls due to Reader Batch blows
documentation: Fix smp typo in memory-barriers.txt
...
Trying to emulate the behaviour of set/way cache ops is fairly
pointless, as there are too many ways we can end-up missing stuff.
Also, there is some system caches out there that simply ignore
set/way operations.
So instead of trying to implement them, let's convert it to VA ops,
and use them as a way to re-enable the trapping of VM ops. That way,
we can detect the point when the MMU/caches are turned off, and do
a full VM flush (which is what the guest was trying to do anyway).
This allows a 32bit zImage to boot on the APM thingy, and will
probably help bootloaders in general.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While the generation of a (virtual) inter-processor interrupt (SGI)
on a GICv2 works by writing to a MMIO register, GICv3 uses the system
register ICC_SGI1R_EL1 to trigger them.
Add a trap handler function that calls the new SGI register handler
in the GICv3 code. As ICC_SRE_EL1.SRE at this point is still always 0,
this will not trap yet, but will only be used later when all the data
structures have been initialized properly.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
With everything separated and prepared, we implement a model of a
GICv3 distributor and redistributors by using the existing framework
to provide handler functions for each register group.
Currently we limit the emulation to a model enforcing a single
security state, with SRE==1 (forcing system register access) and
ARE==1 (allowing more than 8 VCPUs).
We share some of the functions provided for GICv2 emulation, but take
the different ways of addressing (v)CPUs into account.
Save and restore is currently not implemented.
Similar to the split-off of the GICv2 specific code, the new emulation
code goes into a new file (vgic-v3-emul.c).
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
vgic.c is currently a mixture of generic vGIC emulation code and
functions specific to emulating a GICv2. To ease the addition of
GICv3, split off strictly v2 specific parts into a new file
vgic-v2-emul.c.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
-------
As the diff isn't always obvious here (and to aid eventual rebases),
here is a list of high-level changes done to the code:
* added new file to respective arm/arm64 Makefiles
* moved GICv2 specific functions to vgic-v2-emul.c:
- handle_mmio_misc()
- handle_mmio_set_enable_reg()
- handle_mmio_clear_enable_reg()
- handle_mmio_set_pending_reg()
- handle_mmio_clear_pending_reg()
- handle_mmio_priority_reg()
- vgic_get_target_reg()
- vgic_set_target_reg()
- handle_mmio_target_reg()
- handle_mmio_cfg_reg()
- handle_mmio_sgi_reg()
- vgic_v2_unqueue_sgi()
- read_set_clear_sgi_pend_reg()
- write_set_clear_sgi_pend_reg()
- handle_mmio_sgi_set()
- handle_mmio_sgi_clear()
- vgic_v2_handle_mmio()
- vgic_get_sgi_sources()
- vgic_dispatch_sgi()
- vgic_v2_queue_sgi()
- vgic_v2_map_resources()
- vgic_v2_init()
- vgic_v2_add_sgi_source()
- vgic_v2_init_model()
- vgic_v2_init_emulation()
- handle_cpu_mmio_misc()
- handle_mmio_abpr()
- handle_cpu_mmio_ident()
- vgic_attr_regs_access()
- vgic_create() (renamed to vgic_v2_create())
- vgic_destroy() (renamed to vgic_v2_destroy())
- vgic_has_attr() (renamed to vgic_v2_has_attr())
- vgic_set_attr() (renamed to vgic_v2_set_attr())
- vgic_get_attr() (renamed to vgic_v2_get_attr())
- struct kvm_mmio_range vgic_dist_ranges[]
- struct kvm_mmio_range vgic_cpu_ranges[]
- struct kvm_device_ops kvm_arm_vgic_v2_ops {}
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
GIC CPU interface for EL1 (guests). Currently we force it to 0, but
for proper GICv3 support we have to allow guests to use it (depending
on their selected virtual GIC model).
So add ICC_SRE_EL1 to the list of saved/restored registers on a
world switch, but actually disallow a guest to change it by only
restoring a fixed, once-initialized value.
This value depends on the GIC model userland has chosen for a guest.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The virtual MPIDR registers (containing topology information) for the
guest are currently mapped linearily to the vcpu_id. Improve this
mapping for arm64 by using three levels to not artificially limit the
number of vCPUs.
To help this, change and rename the kvm_vcpu_get_mpidr() function to
mask off the non-affinity bits in the MPIDR register.
Also add an accessor to later allow easier access to a vCPU with a
given MPIDR. Use this new accessor in the PSCI emulation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>