Commit Graph

114543 Commits

Author SHA1 Message Date
Mohammed Gamal
ea953ef0ca KVM: VMX: Add invalid guest state handler
This adds the invalid guest state handler function which invokes the x86
emulator until getting the guest to a VMX-friendly state.

[avi: leave atomic context if scheduling]
[guillaume: return to atomic context correctly]

Signed-off-by: Laurent Vivier <laurent.vivier@bull.net>
Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net>
Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
04fa4d3211 KVM: VMX: Add module parameter and emulation flag.
The patch adds the module parameter required to enable emulating invalid
guest state, as well as the emulation_required flag used to drive
emulation whenever needed.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Mohammed Gamal
648dfaa7df KVM: VMX: Add Guest State Validity Checks
This patch adds functions to check whether guest state is VMX compliant.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Amit Shah
6762b7299a KVM: Device assignment: Check for privileges before assigning irq
Even though we don't share irqs at the moment, we should ensure
regular user processes don't try to allocate system resources.

We check for capability to access IO devices (CAP_SYS_RAWIO) before
we request_irq on behalf of the guest.

Noticed by Avi.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:20 +02:00
Avi Kivity
dc7404cea3 KVM: Handle spurious acks for PIT interrupts
Spurious acks can be generated, for example if the PIC is being reset.
Handle those acks gracefully rather than flooding the log with warnings.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Marcelo Tosatti
85428ac7c3 KVM: fix i8259 reset irq acking
The irq ack during pic reset has three problems:

- Ignores slave/master PIC, using gsi 0-8 for both.
- Generates an ACK even if the APIC is in control.
- Depends upon IMR being clear, which is broken if the irq was masked
at the time it was generated.

The last one causes the BIOS to hang after the first reboot of
Windows installation, since PIT interrupts stop.

[avi: fix check whether pic interrupts are seen by cpu]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Avi Kivity
8ceed34744 KVM: Simplify exception entries by using __ASM_SIZE and _ASM_PTR
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Avi Kivity
ecfc79c700 KVM: VMX: Use interrupt queue for !irqchip_in_kernel
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Marcelo Tosatti
29415c37f0 KVM: set debug registers after "schedulable" section
The vcpu thread can be preempted after the guest_debug_pre() callback,
resulting in invalid debug registers on the new vcpu.

Move it inside the non-preemptable section.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Ben-Ami Yassour
8349b5cd81 KVM: remove unused field from the assigned dev struct
Remove unused field: struct kvm_assigned_pci_dev assigned_dev
from struct: struct kvm_assigned_dev_kernel

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Sheng Yang
464d17c8b7 KVM: VMX: Clean up magic number 0x66 in init_rmode_tss
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:19 +02:00
Dave Hansen
6ad18fba05 KVM: Reduce stack usage in kvm_pv_mmu_op()
We're in a hot path.  We can't use kmalloc() because
it might impact performance.  So, we just stick the buffer that
we need into the kvm_vcpu_arch structure.  This is used very
often, so it is not really a waste.

We also have to move the buffer structure's definition to the
arch-specific x86 kvm header.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen
b772ff362e KVM: Reduce stack usage in kvm_arch_vcpu_ioctl()
[sheng: fix KVM_GET_LAPIC using wrong size]

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen
fa3795a730 KVM: Reduce stack usage in kvm_vcpu_ioctl()
Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Dave Hansen
f0d662759a KVM: Reduce kvm stack usage in kvm_arch_vm_ioctl()
On my machine with gcc 3.4, kvm uses ~2k of stack in a few
select functions.  This is mostly because gcc fails to
notice that the different case: statements could have their
stack usage combined.  It overflows very nicely if interrupts
happen during one of these large uses.

This patch uses two methods for reducing stack usage.
1. dynamically allocate large objects instead of putting
   on the stack.
2. Use a union{} member for all of the case variables. This
   tricks gcc into combining them all into a single stack
   allocation. (There's also a comment on this)

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Ben-Ami Yassour
4d5c5d0fe8 KVM: pci device assignment
Based on a patch from: Amit Shah <amit.shah@qumranet.com>

This patch adds support for handling PCI devices that are assigned to
the guest.

The device to be assigned to the guest is registered in the host kernel
and interrupt delivery is handled.  If a device is already assigned, or
the device driver for it is still loaded on the host, the device
assignment is failed by conveying a -EBUSY reply to the userspace.

Devices that share their interrupt line are not supported at the moment.

By itself, this patch will not make devices work within the guest.
The VT-d extension is required to enable the device to perform DMA.
Another alternative is PVDMA.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Weidong Han <weidong.han@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:18 +02:00
Ben-Ami Yassour
cbff90a7ca KVM: direct mmio pfn check
Userspace may specify memory slots that are backed by mmio pages rather than
normal RAM.  In some cases it is not enough to identify these mmio pages
by pfn_valid().  This patch adds checking the PageReserved as well.

Signed-off-by: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Muli Ben-Yehuda <muli@il.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Glauber Costa
0293615f3f x86: KVM guest: use paravirt function to calculate cpu khz
We're currently facing timing problems in guests that do
calibration under heavy load, and then the load vanishes.
This means we'll have a much lower lpj than we actually should,
and delays end up taking less time than they should, which is a
nasty bug.

Solution is to pass on the lpj value from host to guest, and have it
preset.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Glauber Costa
3807f345b2 x86: paravirt: factor out cpu_khz to common code
KVM intends to use paravirt code to calibrate khz. Xen
current code will do just fine. So as a first step, factor out
code to pvclock.c.

Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Marcelo Tosatti
3cf57fed21 KVM: PIT: fix injection logic and count
The PIT injection logic is problematic under the following cases:

1) If there is a higher priority vector to be delivered by the time
kvm_pit_timer_intr_post is invoked ps->inject_pending won't be set.
This opens the possibility for missing many PIT event injections (say if
guest executes hlt at this point).

2) ps->inject_pending is racy with more than two vcpus. Since there's no locking
around read/dec of pt->pending, two vcpu's can inject two interrupts for a single
pt->pending count.

Fix 1 by using an irq ack notifier: only reinject when the previous irq
has been acked. Fix 2 with appropriate locking around manipulation of
pending count and irq_ack by the injection / ack paths.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:17 +02:00
Marcelo Tosatti
f52447261b KVM: irq ack notification
Based on a patch from: Ben-Ami Yassour <benami@il.ibm.com>
which was based on a patch from: Amit Shah <amit.shah@qumranet.com>

Notify IRQ acking on PIC/APIC emulation. The previous patch missed two things:

- Edge triggered interrupts on IOAPIC
- PIC reset with IRR/ISR set should be equivalent to ack (LAPIC probably
needs something similar).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
CC: Amit Shah <amit.shah@qumranet.com>
CC: Ben-Ami Yassour <benami@il.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Avi Kivity
564f15378f KVM: Add irq ack notifier list
This can be used by kvm subsystems that are interested in when
interrupts are acked, for example time drift compensation.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Hollis Blanchard
49dd2c4928 KVM: powerpc: Map guest userspace with TID=0 mappings
When we use TID=N userspace mappings, we must ensure that kernel mappings have
been destroyed when entering userspace. Using TID=1/TID=0 for kernel/user
mappings and running userspace with PID=0 means that userspace can't access the
kernel mappings, but the kernel can directly access userspace.

The net is that we don't need to flush the TLB on privilege switches, but we do
on guest context switches (which are far more infrequent). Guest boot time
performance improvement: about 30%.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Hollis Blanchard
83aae4a809 KVM: ppc: Write only modified shadow entries into the TLB on exit
Track which TLB entries need to be written, instead of overwriting everything
below the high water mark. Typically only a single guest TLB entry will be
modified in a single exit.

Guest boot time performance improvement: about 15%.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Hollis Blanchard
20754c2495 KVM: ppc: Stop saving host TLB state
We're saving the host TLB state to memory on every exit, but never using it.
Originally I had thought that we'd want to restore host TLB for heavyweight
exits, but that could actually hurt when context switching to an unrelated host
process (i.e. not qemu).

Since this decreases the performance penalty of all exits, this patch improves
guest boot time by about 15%.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Hollis Blanchard
6a0ab738ef KVM: ppc: guest breakpoint support
Allow host userspace to program hardware debug registers to set breakpoints
inside guests.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Alexander Graf
b5e2fec0eb KVM: Ignore DEBUGCTL MSRs with no effect
Netware writes to DEBUGCTL and reads from the DEBUGCTL and LAST*IP MSRs
without further checks and is really confused to receive a #GP during that.
To make it happy we should just make them stubs, which is exactly what SVM
already does.

Writes to DEBUGCTL that are vendor-specific are resembled to behave as if the
virtual CPU does not know them.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Avi Kivity
313dbd49dc KVM: VMX: Avoid vmwrite(HOST_RSP) when possible
Usually HOST_RSP retains its value across guest entries.  Take advantage
of this and avoid a vmwrite() when this is so.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Christian Ehrhardt
3b4bd7969f KVM: ppc: trace powerpc instruction emulation
This patch adds a trace point for the instruction emulation on embedded powerpc
utilizing the KVM_TRACE interface.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Jerone Young
31711f2294 KVM: ppc: adds trace points for ppc tlb activity
This patch adds trace points to track powerpc TLB activities using the
KVM_TRACE infrastructure.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Jerone Young
12f6755602 KVM: ppc: enable KVM_TRACE building for powerpc
This patch enables KVM_TRACE to build for PowerPC arch. This means just
adding sections to Kconfig and Makefile.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Christian Ehrhardt
3f7f95c65e KVM: kvmtrace: replace get_cycles with ktime_get v3
The current kvmtrace code uses get_cycles() while the interpretation would be
easier using using nanoseconds. ktime_get() should give at least the same
accuracy as get_cycles on all architectures (even better on 32bit archs) but
at a better unit (e.g. comparable between hosts with different frequencies.

[avi: avoid ktime_t in public header]

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Christian Ehrhardt
e32c8f2c07 KVM: kvmtrace: Remove use of bit fields in kvm trace structure
This patch fixes kvmtrace use on big endian systems. When using bit fields the
compiler will lay data out in the wrong order expected when laid down into a
file.
This fixes it by using one variable instead of using bit fields.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Avi Kivity
80e31d4f61 KVM: SVM: Unify register save/restore across 32 and 64 bit hosts
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Avi Kivity
c801949ddf KVM: VMX: Unify register save/restore across 32 and 64 bit hosts
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Jan Kiszka
77ab6db0a1 KVM: VMX: Reinject real mode exception
As we execute real mode guests in VM86 mode, exception have to be
reinjected appropriately when the guest triggered them. For this purpose
the patch adopts the real-mode injection pattern used in vmx_inject_irq
to vmx_queue_exception, additionally taking care that the IP is set
correctly for #BP exceptions. Furthermore it extends
handle_rmode_exception to reinject all those exceptions that can be
raised in real mode.

This fixes the execution of himem.exe from FreeDOS and also makes its
debug.com work properly.

Note that guest debugging in real mode is broken now. This has to be
fixed by the scheduled debugging infrastructure rework (will be done
once base patches for QEMU have been accepted).

Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Jan Kiszka
19bd8afdc4 KVM: Consolidate XX_VECTOR defines
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Avi Kivity
7edd0ce058 KVM: Consolidate PIC isr clearing into a function
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:14 +02:00
Mohammed Gamal
60bd83a125 KVM: VMX: Remove redundant check in handle_rmode_exception
Since checking for vcpu->arch.rmode.active is already done whenever we
call handle_rmode_exception(), checking it inside the function is redundant.

Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
f7d9238f5d KVM: VMX: Move interrupt post-processing to vmx_complete_interrupts()
Instead of looking at failed injections in the vm entry path, move
processing to the exit path in vmx_complete_interrupts().  This simplifes
the logic and removes any state that is hidden in vmx registers.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
937a7eaef9 KVM: Add a pending interrupt queue
Similar to the exception queue, this hold interrupts that have been
accepted by the virtual processor core but not yet injected.

Not yet used.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
35920a3569 KVM: VMX: Fix pending exception processing
The vmx code assumes that IDT-Vectoring can only be set when an exception
is injected due to the exception in question.  That's not true, however:
if the exception is injected correctly, and later another exception occurs
but its delivery is blocked due to a fault, then we will incorrectly assume
the first exception was not delivered.

Fix by unconditionally dequeuing the pending exception, and requeuing it
(or the second exception) if we see it in the IDT-Vectoring field.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
26eef70c3e KVM: Clear exception queue before emulating an instruction
If we're emulating an instruction, either it will succeed, in which case
any previously queued exception will be spurious, or we will requeue the
same exception.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
668f612fa0 KVM: VMX: Move nmi injection failure processing to vm exit path
Instead of processing nmi injection failure in the vm entry path, move
it to the vm exit path (vm_complete_interrupts()).  This separates nmi
injection from nmi post-processing, and moves the nmi state from the VT
state into vcpu state (new variable nmi_injected specifying an injection
in progress).

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:13 +02:00
Avi Kivity
cf393f7566 KVM: Move NMI IRET fault processing to new vmx_complete_interrupts()
Currently most interrupt exit processing is handled on the entry path,
which is confusing.  Move the NMI IRET fault processing to a new function,
vmx_complete_interrupts(), which is called on the vmexit path.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Avi Kivity
5b5c6a5a60 KVM: MMU: Simplify kvm_mmu_zap_page()
The twisty maze of conditionals can be reduced.

[joerg: fix tlb flushing]

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Avi Kivity
31aa2b44af KVM: MMU: Separate the code for unlinking a shadow page from its parents
Place into own function, in preparation for further cleanups.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Amit Shah
867767a365 KVM: Introduce kvm_set_irq to inject interrupts in guests
This function injects an interrupt into the guest given the kvm struct,
the (guest) irq number and the interrupt level.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Hollis Blanchard
d98e634635 KVM: Move KVM TRACE DEFINITIONS to common header
Move KVM trace definitions from x86 specific kvm headers to common kvm
headers to create a cross-architecture numbering scheme for trace
events. This means the kvmtrace_format userspace tool won't need to know
which architecture produced the log file being processed.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:12 +02:00
Marcelo Tosatti
5fdbf9765b KVM: x86: accessors for guest registers
As suggested by Avi, introduce accessors to read/write guest registers.
This simplifies the ->cache_regs/->decache_regs interface, and improves
register caching which is important for VMX, where the cost of
vmcs_read/vmcs_write is significant.

[avi: fix warnings]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:13:57 +02:00