Expose the KVM guest CP0_Count frequency to userland via a new
KVM_REG_MIPS_COUNT_HZ register accessible with the KVM_{GET,SET}_ONE_REG
ioctls.
When the frequency is altered the bias is adjusted such that the guest
CP0_Count doesn't jump discontinuously or lose any timer interrupts.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Expose two new virtual registers to userland via the
KVM_{GET,SET}_ONE_REG ioctls.
KVM_REG_MIPS_COUNT_CTL is for timer configuration fields and just
contains a master disable count bit. This can be used by userland to
freeze the timer in order to read a consistent state from the timer
count value and timer interrupt pending bit. This cannot be done with
the CP0_Cause.DC bit because the timer interrupt pending bit (TI) is
also in CP0_Cause so it would be impossible to stop the timer without
also risking a race with an hrtimer interrupt and having to explicitly
check whether an interrupt should have occurred.
When the timer is re-enabled it resumes without losing time, i.e. the
CP0_Count value jumps to what it would have been had the timer not been
disabled, which would also be impossible to do from userland with
CP0_Cause.DC. The timer interrupt also cannot be lost, i.e. if a timer
interrupt would have occurred had the timer not been disabled it is
queued when the timer is re-enabled.
This works by storing the nanosecond monotonic time when the master
disable is set, and using it for various operations instead of the
current monotonic time (e.g. when recalculating the bias when the
CP0_Count is set), until the master disable is cleared again, i.e. the
timer state is read/written as it would have been at that time. This
state is exposed to userland via the read-only KVM_REG_MIPS_COUNT_RESUME
virtual register so that userland can determine the exact time the
master disable took effect.
This should allow userland to atomically save the state of the timer,
and later restore it.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The KVM_HOST_FREQ Kconfig symbol was used by KVM guest kernels to
override the timer frequency calculation to a value based on the host
frequency. Now that the KVM timer emulation is implemented independent
of the host timer frequency and defaults to 100MHz, adjust the working
of CONFIG_KVM_HOST_FREQ to match.
The Kconfig symbol now specifies the guest timer frequency directly, and
has been renamed accordingly to KVM_GUEST_TIMER_FREQ. It now defaults to
100MHz too and the help text is updated to make it clear that a zero
value will allow the normal timer frequency calculation to take place
(based on the emulated RTC).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Previously the emulation of the CPU timer was just enough to get a Linux
guest running but some shortcuts were taken:
- The guest timer interrupt was hard coded to always happen every 10 ms
rather than being timed to when CP0_Count would match CP0_Compare.
- The guest's CP0_Count register was based on the host's CP0_Count
register. This isn't very portable and fails on cores without a
CP_Count register implemented such as Ingenic XBurst. It also meant
that the guest's CP0_Cause.DC bit to disable the CP0_Count register
took no effect.
- The guest's CP0_Count register was emulated by just dividing the
host's CP0_Count register by 4. This resulted in continuity problems
when used as a clock source, since when the host CP0_Count overflows
from 0x7fffffff to 0x80000000, the guest CP0_Count transitions
discontinuously from 0x1fffffff to 0xe0000000.
Therefore rewrite & fix emulation of the guest timer based on the
monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias
value is added to the frequency scaled nanosecond monotonic time to get
the guest's CP0_Count. The frequency of the timer is initialised to
100MHz and cannot yet be changed, but a later patch will allow the
frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl
interface.
The timer can now be stopped via the CP0_Cause.DC bit (by the guest or
via the KVM_SET_ONE_REG ioctl interface), at which point the current
CP0_Count is stored and can be read directly. When it is restarted the
bias is recalculated such that the CP0_Count value is continuous.
Due to the nature of hrtimer interrupts any read of the guest's
CP0_Count register while it is running triggers a check for whether the
hrtimer has expired, so that the guest/userland cannot observe the
CP0_Count passing CP0_Compare without queuing a timer interrupt. This is
also taken advantage of when stopping the timer to ensure that a pending
timer interrupt is queued.
This replaces the implementation of:
- Guest read of CP0_Count
- Guest write of CP0_Count
- Guest write of CP0_Compare
- Guest write of CP0_Cause
- Guest read of HWR 2 (CC) with RDHWR
- Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface
- Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface
- Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface
- Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When a VCPU is scheduled in on a different CPU, refresh the hrtimer used
for emulating count/compare so that it gets migrated to the same CPU.
This should prevent a timer interrupt occurring on a different CPU to
where the guest it relates to is running, which would cause the guest
timer interrupt not to be delivered until after the next guest exit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The hrtimer callback for guest timer timeouts sets the guest's
CP0_Cause.TI bit to indicate to the guest that a timer interrupt is
pending, however there is no mutual exclusion implemented to prevent
this occurring while the guest's CP0_Cause register is being
read-modify-written elsewhere.
When this occurs the setting of the CP0_Cause.TI bit is undone and the
guest misses the timer interrupt and doesn't reprogram the CP0_Compare
register for the next timeout. Currently another timer interrupt will be
triggered again in another 10ms anyway due to the way timers are
emulated, but after the MIPS timer emulation is fixed this would result
in Linux guest time standing still and the guest scheduler not being
invoked until the guest CP0_Count has looped around again, which at
100MHz takes just under 43 seconds.
Currently this is the only asynchronous modification of guest registers,
therefore it is fixed by adjusting the implementations of the
kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and
kvm_change_c0_guest_cause() macros which are used for modifying the
guest CP0_Cause register to use ll/sc to ensure atomic modification.
This should work in both UP and SMP cases without requiring interrupts
to be disabled.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When about to run the guest, deliver guest interrupts after disabling
host interrupts. This should prevent an hrtimer interrupt from being
handled after delivering guest interrupts, and therefore not delivering
the guest timer interrupt until after the next guest exit.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
HWREna register. This is so that userland can save and restore its
value so that RDHWR instructions don't have to be emulated by the guest.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
UserLocal register. This is so that userland can save and restore its
value.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
Count and Compare registers. These registers are special in that writing
to them has side effects (adjusting the time until the next timer
interrupt) and reading of Count depends on the time. Therefore add a
couple of callbacks so that different implementations (trap & emulate or
VZ) can implement them differently depending on what the hardware
provides.
The trap & emulate versions mostly duplicate what happens when a T&E
guest reads or writes these registers, so it inherits the same
limitations which can be fixed in later patches.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of
kvm_mips.c to kvm_host.h so that they can be shared between multiple
source files. This allows register access to be indirected depending on
the underlying implementation (trap & emulate or VZ).
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Contrary to the comment, the guest CP0_EPC register cannot be set via
kvm_regs, since it is distinct from the guest PC. Add the EPC register
to the KVM_{GET,SET}_ONE_REG ioctl interface.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: David Daney <david.daney@cavium.com>
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When MIPS KVM needs to write a TLB entry for the guest it reads the
CP0_Random register, uses it to generate the CP_Index, and writes the
TLB entry using the TLBWI instruction (tlb_write_indexed()).
However there's an instruction for that, TLBWR (tlb_write_random()) so
use that instead.
This happens to also fix an issue with Ingenic XBurst cores where the
same TLB entry is replaced each time preventing forward progress on
stores due to alternating between TLB load misses for the instruction
fetch and TLB store misses.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
MIPS KVM uses mips32_SyncICache to synchronise the icache with the
dcache after dynamically modifying guest instructions or writing guest
exception vector. However this uses rdhwr to get the SYNCI step, which
causes a reserved instruction exception on Ingenic XBurst cores.
It would seem to make more sense to use local_flush_icache_range()
instead which does the same thing but is more portable.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Export the local_flush_icache_range function pointer for GPL modules so
that it can be used by KVM for syncing the icache after binary
translation of trapping instructions.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Each MIPS KVM guest has its own copy of the KVM exception vector. This
contains the TLB refill exception handler at offset 0x000, the general
exception handler at offset 0x180, and interrupt exception handlers at
offset 0x200 in case Cause_IV=1. A common handler is copied to offset
0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
from guest.
However the amount of memory allocated for this purpose is calculated as
0x200 rounded up to the next page boundary, which is insufficient if 4KB
pages are in use. This can lead to the common handler at offset 0x2000
being overwritten and infinitely recursive exceptions on the next exit
from the guest.
Increase the minimum size from 0x200 to 0x4000 to cover the full use of
the page.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Sanjay Lal <sanjayl@kymasys.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2. Fix flag check for gdb support
3. Remove unnecessary vcpu start
4. Remove code duplication for sigp interrupts
5. Better DAT handling for the TPROT instruction
6. Correct addressing exception for standby memory
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJTiD4+AAoJEBF7vIC1phx88poQALJkg1EZLePQBwuqe3XKeUnV
AxbtP1T5++Pv+JpXj/mTmI4ToE3U+t66iObCZ8GAWZtXNr47yJQRetTqSU1G8BFe
eLpuWhyaQD31awp5M9sWIAzNjqnO4vR0f6kbVeoXbRmPzdZY19QaQb5QIRFMA/GT
hSYlNoftRfM/e2ZRFUvqiMIKFHs2r/f9T+7SMRhu3YeZo42N3KOnIwFQrLCoaP+x
8GBmxfl0OaEU9Rer6/MvyJVdMHD5ap4UyreaB5PLVtUNGIrcAJ2vjQGbF78bRonA
9SwJ0mMEfzJoH6gZXDEby1e/p7MJ80wHf9jMoy2u2qtyvNuQIyBWS1eco7LH5dpd
pxXcViLC8proiXfFK/6N4ra5cr3auMAmA5iX1LjD+ZxhAjlA+WTcC31jSfueDn33
mBpDASLcoJXbPoEhWiAjP8wkgfUAQnr5CPu5lrmloOrj5llYx0FL2gDhX5dXe7op
5orTAcEtfSxqosK/Av2oQnmUU61BkI4UuBwCbpqM/1w8oKcYX4jUE5Yx/Kmej80Z
MtpYUnBDmJaYPogddr8q5QcnRyLEoRUKfKu5lDnlLaP1oi7vr+WBR+BkNLl0/o7o
+XVdruc7OtC/Ahtep8unkLV5C8j1siXergjvHXhHjpGNsIQ2tfD8LO55ST4vYkUd
GAosCgM/6wt8YmK67qc+
=84hG
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-20140530' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
1. Several minor fixes and cleanups for KVM:
2. Fix flag check for gdb support
3. Remove unnecessary vcpu start
4. Remove code duplication for sigp interrupts
5. Better DAT handling for the TPROT instruction
6. Correct addressing exception for standby memory
Based on original patch from Jeng-fang (Nick) Wang
When standby memory is specified for a guest Linux, but no virtual memory has
been allocated on the Qemu host backing that guest, the guest memory detection
process encounters a memory access exception which is not thrown from the KVM
handle_tprot() instruction-handler function. The access exception comes from
sie64a returning EFAULT, which then passes an addressing exception to the guest.
Unfortunately this does not the proper PSW fixup (nullifying vs.
suppressing) so the guest will get a fault for the wrong address.
Let's just intercept the tprot instruction all the time to do the right thing
and not go the page fault handler path for standby memory. tprot is only used
by Linux during startup so some exits should be ok.
Without this patch, standby memory cannot be used with KVM.
Signed-off-by: Nick Wang <jfwang@us.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Tested-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch removes the start of a VCPU when delivering a RESTART interrupt.
Interrupt delivery is called from kvm_arch_vcpu_ioctl_run. So the VCPU is
already considered started - no need to call kvm_s390_vcpu_start. This function
will early exit anyway.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch fixes a minor bug when updating the guest debug settings.
We should check the given debug flags, not the already set ones.
Doesn't do any harm but too many (for now unused) flags could be set internally
without error.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
We have all the logic to inject interrupts available in
kvm_s390_inject_vcpu(), so let's use it instead of
injecting irqs manually to the list in sigp code.
SIGP stop is special because we have to check the
action_flags before injecting the interrupt. As
the action_flags are not available in kvm_s390_inject_vcpu()
we leave the code for the stop order code untouched for now.
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The TPROT instruction can be used to check the accessability of storage
for any kind of logical addresses. So far, our handler only supported
real addresses. This patch now also enables support for addresses that
have to be translated via DAT first. And while we're at it, change the
code to use the common KVM function gfn_to_hva_prot() to check for the
validity and writability of the memory page.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
This patch adds a function for translating logical guest addresses into
physical guest addresses without touching the memory at the given location.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The memory alias support has been removed since a1f4d39500 (KVM: Remove
memory alias support). So remove unalias_gfn from the MIPS port.
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Deng-Cheng Zhu <dengcheng.zhu@imgtec.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit e71246a23a changes psci_init from a
function returning a void to an int, but does not change the non
CONFIG_ARM_PSCI implementation to return a value, which causes a compile
warning. Just return 0.
Cc: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Cc: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This includes KVM support for PSCI v0.2 and also includes generic Linux
support for PSCI v0.2 (on hosts that advertise that feature via their
DT), since the latter depends on headers introduced by the former.
Finally there's a small patch from Marc that enables Cortex-A53 support.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJTgjKZAAoJEEtpOizt6ddyi2gIAIO93NUZqQfJ/HmGo0cqvTsA
OixRhSdAIgigzIavuGfp8UvTtk8WH82Qo6G7sy8/UveT1uoc3hZkfxkASLfjELw4
rKhfMGwfXifC5zrgQ9+h1CM77lLpWMU1+PAqUOO2TZXjlOHZxSpx5AdfY03aGxvb
sL+ovj02eGXB0IxR7dNI4XPIRS7ny+2OOzoKKH4u6ogQlwm96pptQ634sWUSM+mB
dedJBLZHEattn5GLh+QnvDvdrROgE5wR/Ji4PX2YdXoaeEjiz0dcmLxJknj7zhj8
jwq4NulpV2FQx5gc/KgpifCtmRo87mWgKNm6A2xJjRigcn00ekeAlADcwA4sppo=
=J4JR
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next
Changed for the 3.16 merge window.
This includes KVM support for PSCI v0.2 and also includes generic Linux
support for PSCI v0.2 (on hosts that advertise that feature via their
DT), since the latter depends on headers introduced by the former.
Finally there's a small patch from Marc that enables Cortex-A53 support.
MOV CR/DR instructions ignore the mod field (in the ModR/M byte). As the SDM
states: "The 2 bits in the mod field are ignored". Accordingly, the second
operand of these instructions is always a general purpose register.
The current emulator implementation does not do so. If the mod bits do not
equal 3, it expects the second operand to be in memory.
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When Hyper-V enlightenments are in effect, Windows prefers to issue an
Hyper-V MSR write to issue an EOI rather than an x2apic MSR write.
The Hyper-V MSR write is not handled by the processor, and besides
being slower, this also causes bugs with APIC virtualization. The
reason is that on EOI the processor will modify the highest in-service
interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of
the SDM; every other step in EOI virtualization is already done by
apic_send_eoi or on VM entry, but this one is missing.
We need to do the same, and be careful not to muck with the isr_count
and highest_isr_cache fields that are unused when virtual interrupt
delivery is enabled.
Cc: stable@vger.kernel.org
Reviewed-by: Yang Zhang <yang.z.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In order to allow KVM to run on Cortex-A53 implementations, wire the
minimal support required.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
The DR7 masking which is done on task switch emulation should be in hex format
(clearing the local breakpoints enable bits 0,2,4 and 6).
Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
I noticed on some of my systems that page fault tracing doesn't
work:
cd /sys/kernel/debug/tracing
echo 1 > events/exceptions/enable
cat trace;
# nothing shows up
I eventually traced it down to CONFIG_KVM_GUEST. At least in a
KVM VM, enabling that option breaks page fault tracing, and
disabling fixes it. I tried on some old kernels and this does
not appear to be a regression: it never worked.
There are two page-fault entry functions today. One when tracing
is on and another when it is off. The KVM code calls do_page_fault()
directly instead of calling the traced version:
> dotraplinkage void __kprobes
> do_async_page_fault(struct pt_regs *regs, unsigned long
> error_code)
> {
> enum ctx_state prev_state;
>
> switch (kvm_read_and_reset_pf_reason()) {
> default:
> do_page_fault(regs, error_code);
> break;
> case KVM_PV_REASON_PAGE_NOT_PRESENT:
I'm also having problems with the page fault tracing on bare
metal (same symptom of no trace output). I'm unsure if it's
related.
Steven had an alternative to this which has zero overhead when
tracing is off where this includes the standard noops even when
tracing is disabled. I'm unconvinced that the extra complexity
of his apporach:
http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home
is worth it, expecially considering that the KVM code is already
making page fault entry slower here. This solution is
dirt-simple.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: kvm@vger.kernel.org
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
CS.RPL is not equal to the CPL in the few instructions between
setting CR0.PE and reloading CS. And CS.DPL is also not equal
to the CPL for conforming code segments.
However, SS.DPL *is* always equal to the CPL except for the weird
case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
value in the STAR MSR, but force CPL=3 (Intel instead forces
SS.DPL=SS.RPL=CPL=3).
So this patch:
- modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
the above case with SYSRET is not broken further, and the way
to fix it would be to pass the CPL to userspace and back
- modifies VMX to always return the CPL from SS.DPL (except
forcing it to 0 if we are emulating real mode via vm86 mode;
in vm86 mode all DPLs have to be 3, but real mode does allow
privileged instructions). It also removes the CPL cache,
which becomes a duplicate of the SS access rights cache.
This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
CR0.PE=1 but before CS has been reloaded.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Table 7-1 of the SDM mentions a check that the code segment's
DPL must match the selector's RPL. This was not done by KVM,
fix it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
to all the other requirements) and will be the new CPL. So far this
worked by carefully setting the CS selector and flag before doing the
task switch; setting CS.selector will already change the CPL.
However, this will not work once we get the CPL from SS.DPL, because
then you will have to set the full segment descriptor cache to change
the CPL. ctxt->ops->cpl(ctxt) will then return the old CPL during the
task switch, and the check that SS.DPL == CPL will fail.
Temporarily assume that the CPL comes from CS.RPL during task switch
to a protected-mode task. This is the same approach used in QEMU's
emulation code, which (until version 2.0) manually tracks the CPL.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This patch splits the SIE state guest prefix at offset 4
into a prefix bit field. Additionally it provides the
access functions:
- kvm_s390_get_prefix()
- kvm_s390_set_prefix()
to access the prefix per vcpu.
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
The patch adds functionality to retrieve the IBC configuration
by means of function sclp_get_ibc().
Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
If the sigp interpretation facility is installed, most SIGP EXTERNAL CALL
operations will be interpreted instead of intercepted. A partial execution
interception will occurr at the sending cpu only if the target cpu is in the
wait state ("W" bit in the cpuflags set). Instruction interception will only
happen in error cases (e.g. cpu addr invalid).
As a sending cpu might set the external call interrupt pending flags at the
target cpu at every point in time, we can't handle this kind of interrupt using
our kvm interrupt injection mechanism. The injection will be done automatically
by the SIE when preparing the start of the target cpu.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
CC: Thomas Huth <thuth@linux.vnet.ibm.com>
[Adopt external call injection to check for sigp interpretion]
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The current trace definition doesn't work very well with the perf tool.
Perf shows a "insn_to_mnemonic not found" message. Let's handle the
decoding completely in a parseable format.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch adds a new decoder of SIE intercepted instructions.
The decoder implemented as a macro and potentially can be used in
both kernelspace and userspace.
Note that this simplified instruction decoder is only intended to be
used with the subset of instructions that may cause a SIE intercept.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Use the symbolic translation tables from sie.h for decoding diag, sigp
and sie exit codes.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch defines tables of reasons for exiting from SIE mode
in a new sie.h header file. Tables contain SIE intercepted codes,
intercepted instructions and program interruptions codes.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Use the new helper function kvm_arch_fault_in_page() for faulting-in
the guest pages and only inject addressing errors when we've really
hit a bad address (and return other error codes to userspace instead).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Rework the function kvm_arch_fault_in_sync() to become a proper helper
function for faulting-in a guest page. Now it takes the guest address as
a parameter and does not ignore the possible error code from gmap_fault()
anymore (which could cause undetected error conditions before).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If the new PSW for program interrupts is invalid, the VM ends up
in an endless loop of specification exceptions. Since there is not
much left we can do in this case, we should better drop to userspace
instead so that the crash can be reported to the user.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
As a program status word is also invalid (and thus generates an
specification exception) if the instruction address is not even,
we should test this in is_valid_psw(), too. This patch also exports
the function so that it becomes available for other parts of the
S390 KVM code as well.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Use the mm semaphore to serialize multiple invocations of s390_enable_skey.
The second CPU faulting on a storage key operation needs to wait for the
completion of the page table update. Taking the mm semaphore writable
has the positive side-effect that it prevents any host faults from
taking place which does have implications on keys vs PGSTE.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
PSCIv0.2 adds a new function called AFFINITY_INFO, which
can be used to query if a specified CPU has actually gone
offline. Calling this function via cpu_kill ensures that
a CPU has quiesced after a call to cpu_die. This helps
prevent the CPU from doing arbitrary bad things when data
or instructions are clobbered (as happens with kexec)
in the window between a CPU announcing that it is dead
and said CPU leaving the kernel.
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
The PSCIv0.2 spec defines standard values of function IDs
and introduces a few new functions. Detect version of PSCI
and appropriately select the right PSCI functions.
Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
Reviewed-by: Rob Herring <robh@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Regression of 346874c9: PAE is set in long mode, but that does not mean
we have valid PDPTRs.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>