Commit Graph

455005 Commits

Author SHA1 Message Date
Mark D Rustad
42cbc04fd3 x86/kvm: Resolve shadow warnings in macro expansion
Resolve shadow warnings that appear in W=2 builds. Instead of
using ret to hold the return pointer, save the length in a new
variable saved_len and compute the pointer on exit. This also
resolves a very technical error, in that ret was declared as
a const char *, when it really was a char * const.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-31 16:33:29 +02:00
Paolo Bonzini
307d2740b1 Two fixes for recently introduced regressions
- a memory leak on busy SIGP
 - pontentially lost SIGP stop in rare situations (shutdown loops)
 
 The first issue is not part of a released kernel. The 2nd issue is
 present in all KVM versions, but did not trigger before commit
 7dfc63cf97 (KVM: s390: allow only one SIGP STOP
 (AND STORE STATUS) at a time) with Linux as a guest.
 So no need for cc stable
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJT2fIAAAoJEBF7vIC1phx8IU4P/A83LP0jpEu3tdvIjU4g9Ev8
 yan96/6d/QPOe4mlR7TkalbnkC0mT09ZFnUKnfdVIYD89HWa0gnoXKXhXYge8ZoK
 uB90uTV/O/ySJuSYUkITNHGqvZPkqGc75GrdBsQ1KP9Kcs475KREkStEj4nXhSgb
 IoIL/yuBkYpqroEcZbgV6r2u2UYXGdle131OKHLO4xJ96tniVWsSM32E1/sh29zj
 Sr9ho0dNHdKO4ILKc03osCS3Y49gXC1nmODPLvTs1pBQtw7RSJ5dM9aqCR1XQkaD
 q5PJFQsjBQEhrHOTOZUYAxdk7j2kT2qxZqwYahL+WelbKVZgwdNQMZ78Xrq1RXVp
 hQC29KCYrNbzZMVim0ZQCSWVrlQtdoAnjXejeRQShVVYCjo0JckPaSNB+tpPCtaV
 +e4wPyhLFDmKZIX1Pme716TYxuLL4pCwU8wd6b+DGJ7hq9rzILs8guQua1eMYiCD
 6CjSkEbpDhSthXJ3qdDpIWId13ppY0sVvaFoC3Jnwrj167gRXl0tfK2nX4EpexaG
 rrqWXhwLFuRWm4ZDJaE6tg/bQSsDumXSUX1zGUoFe522DppuKBJCeYzXTrqtokYd
 vyl4CzHlkov1K7EmJfUu69BMyVDWPX7JMOnM2k4Mde2io6IEIqYQvLK5fcAhAE7r
 ElmgACkwwrEL62qryq5l
 =+xZE
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140730' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

Two fixes for recently introduced regressions
- a memory leak on busy SIGP
- pontentially lost SIGP stop in rare situations (shutdown loops)

The first issue is not part of a released kernel. The 2nd issue is
present in all KVM versions, but did not trigger before commit
7dfc63cf97 (KVM: s390: allow only one SIGP STOP
(AND STORE STATUS) at a time) with Linux as a guest.
So no need for cc stable
2014-07-31 16:31:49 +02:00
David Hildenbrand
db37386147 KVM: s390: rework broken SIGP STOP interrupt handling
A VCPU might never stop if it intercepts (for whatever reason) between
"fake interrupt delivery" and execution of the stop function.

Heart of the problem is that SIGP STOP is an interrupt that has to be
processed on every SIE entry until the VCPU finally executes the stop
function.

This problem was made apparent by commit 7dfc63cf97
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time).
With the old code, the guest could (incorrectly) inject SIGP STOPs
multiple times. The bug of losing a sigp stop exists in KVM before
7dfc63cf97, but it was hidden by Linux guests doing a sigp stop loop.
The new code (rightfully) returns CC=2 and does not queue a new
interrupt.

This patch is a simple fix of the problem. Longterm we are going to
rework that code - e.g. get rid of the action bits and so on.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[some additional patch description]
2014-07-31 09:20:35 +02:00
Paolo Bonzini
0f6c0a740b KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table
Currently, the EOI exit bitmap (used for APICv) does not include
interrupts that are masked.  However, this can cause a bug that manifests
as an interrupt storm inside the guest.  Alex Williamson reported the
bug and is the one who really debugged this; I only wrote the patch. :)

The scenario involves a multi-function PCI device with OHCI and EHCI
USB functions and an audio function, all assigned to the guest, where
both USB functions use legacy INTx interrupts.

As soon as the guest boots, interrupts for these devices turn into an
interrupt storm in the guest; the host does not see the interrupt storm.
Basically the EOI path does not work, and the guest continues to see the
interrupt over and over, even after it attempts to mask it at the APIC.
The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
with not many changes in the area of APIC/IOAPIC handling).

Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
on in the eoi_exit_bitmap and TMR, and things then work.  What happens
is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
It does not have set bit 59 because the RTE was masked, so the IOAPIC
never sees the EOI and the interrupt continues to fire in the guest.

My guess was that the guest is masking the interrupt in the redirection
table in the interrupt routine, i.e. while the interrupt is set in a
LAPIC's ISR, The simplest fix is to ignore the masking state, we would
rather have an unnecessary exit rather than a missed IRQ ACK and anyway
IOAPIC interrupts are not as performance-sensitive as for example MSIs.
Alex tested this patch and it fixed his bug.

[Thanks to Alex for his precise description of the problem
 and initial debugging effort.  A lot of the text above is
 based on emails exchanged with him.]

Reported-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-30 20:22:30 +02:00
Chris J Arges
296f047502 KVM: vmx: remove duplicate vmx_mpx_supported() prototype
Remove a prototype which was added by both 93c4adc7af and 36be0b9deb.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-30 17:43:57 +02:00
Christian Borntraeger
d514f42641 KVM: s390: Fix memory leak on busy SIGP stop
commit 7dfc63cf97
(KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time)
introduced a memory leak if a sigp stop is already pending. Free
the allocated inti structure.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
2014-07-30 15:29:40 +02:00
Mark Rustad
b55a8144d1 x86/kvm: Resolve shadow warning from min macro
Resolve a shadow warning generated in W=2 builds by the nested
use of the min macro by instead using the min3 macro for the
minimum of 3 values.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-25 16:05:54 +02:00
Mark Rustad
25f97ff451 kvm: Resolve missing-field-initializers warnings
Resolve missing-field-initializers warnings seen in W=2 kernel
builds by having macros generate more elaborated initializers.
That is enough to silence the warnings.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-25 16:05:46 +02:00
Paolo Bonzini
03916db934 Replace NR_VMX_MSR with its definition
Using ARRAY_SIZE directly makes it easier to read the code.  While touching
the code, replace the division by a multiplication in the recently added
BUILD_BUG_ON.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-24 14:21:57 +02:00
Nadav Amit
0123be429f KVM: x86: Assertions to check no overrun in MSR lists
Currently there is no check whether shared MSRs list overrun the allocated size
which can results in bugs. In addition there is no check that vmx->guest_msrs
has sufficient space to accommodate all the VMX msrs.  This patch adds the
assertions.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-24 14:16:57 +02:00
Nadav Amit
d6e8c85456 KVM: x86: set rflags.rf during fault injection
x86 does not automatically set rflags.rf during event injection. This patch
does partial job, setting rflags.rf upon fault injection.  It does not handle
the setting of RF upon interrupt injection on rep-string instruction.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-24 14:04:01 +02:00
Nadav Amit
b9a1ecb909 KVM: x86: Setting rflags.rf during rep-string emulation
This patch updates RF for rep-string emulation.  The flag is set upon the first
iteration, and cleared after the last (if emulated). It is intended to make
sure that if a trap (in future data/io #DB emulation) or interrupt is delivered
to the guest during the rep-string instruction, RF will be set correctly. RF
affects whether instruction breakpoint in the guest is masked.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-24 14:03:54 +02:00
Paolo Bonzini
c756ad036f Bugfixes
--------
 - add IPTE to trace event decoder
 - document and advertise KVM_CAP_S390_IRQCHIP
 
 Cleanups
 --------
 - Reuse kvm_vcpu_block for s390
 - Get rid of tasklet for wakup processing
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTzRsBAAoJEBF7vIC1phx8gf4P/0TlwS6YoLsWJ2VFOwLyjsbZ
 SU3WVPKgWwAlEE1XyTHpnKIainpUUp7qc227/QZDt7plONfZHir/kmkAeZlfqVD/
 tVogeYXBHFvLIvBJVLOhcGnSqK6WaHiUFJtv9Goz3wD6GCe+qzFgV15gXlR9Nw7E
 mOISwpFOoa8lDWj0XTF5sfchdwyVY3Nj49W2quNmpp8vKkwHSyHzyu5MKd7wBmI2
 9LV25/DzDKW8uslEXINgUFpoUT/LQ4regWjhHPXaoA7+zBvttD/33zdXzeR8qBxm
 FnV1Pypzxd69YfHg8CVTRoLS3knKdWNAReCGC8LX1XyF17cOhuU3FlkQRln8dGLP
 B2M3p1sMUR+fiXkaEZdRdFVkBtxEvmAzyAcBd7jklGpM2/RTHOcYfzQl66UO4EK4
 dfKvK5NuCz+t4q6C6rUgjLDKv2GxZulzARVMlI99bBFc1g57HHDD+trt0yXTAX81
 GqLIkrAT0H+RX1IrQNvhgEg2j8wME2tCmw/BXFYuLLlREoDniMKPZwteBSLZMjy5
 6BstgWYarOdul59XWNjogSlPkyaqh0fxDul00X+/sTxEwxgJfApOmpcU7XekD5FP
 4EHC8BHG3TBEjAoMD71La7a4P6vh8F54KLPta8YrtP3pe1d4Ez/15Mi2Wf/sdAVQ
 0Ym0Q9y1QMeH/s/CPDkB
 =8McL
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140721' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

Bugfixes
--------
- add IPTE to trace event decoder
- document and advertise KVM_CAP_S390_IRQCHIP

Cleanups
--------
- Reuse kvm_vcpu_block for s390
- Get rid of tasklet for wakup processing
2014-07-22 10:22:53 +02:00
Nadav Amit
6f43ed01e8 KVM: x86: DR6/7.RTM cannot be written
Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is
not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the
current KVM implementation. This bug is apparent only if the MOV-DR instruction
is emulated or the host also debugs the guest.

This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and
set respectively. It also sets DR6.RTM upon every debug exception. Obviously,
it is not a complete fix, as debugging of RTM is still unsupported.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 17:17:52 +02:00
Paolo Bonzini
9a2a05b9ed KVM: nVMX: clean up nested_release_vmcs12 and code around it
Make nested_release_vmcs12 idempotent.

Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 14:29:49 +02:00
Paolo Bonzini
4fa7734c62 KVM: nVMX: fix lifetime issues for vmcs02
free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in
order to detach it from the shadow vmcs.  However, this is not
available anymore after commit 26a865f4aa (KVM: VMX: fix use after
free of vmx->loaded_vmcs, 2014-01-03).

Revert that patch, and fix its problem by forcing a vmcs01 as the
active VMCS before freeing all the nested VMX state.

Reported-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 14:29:49 +02:00
Nadav Amit
c9cdd085bb KVM: x86: Defining missing x86 vectors
Defining XE, XM and VE vector numbers.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 14:18:51 +02:00
Nadav Amit
4161a56906 KVM: x86: emulator injects #DB when RFLAGS.RF is set
If the RFLAGS.RF is set, then no #DB should occur on instruction breakpoints.
However, the KVM emulator injects #DB regardless to RFLAGS.RF. This patch fixes
this behavior. KVM, however, still appears not to update RFLAGS.RF correctly,
regardless of this patch.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:43:09 +02:00
Nadav Amit
6c6cb69b8e KVM: x86: Cleanup of rflags.rf cleaning
RFLAGS.RF was cleaned in several functions (e.g., syscall) in the x86 emulator.
Now that we clear it before the execution of an instruction in the emulator, we
can remove the specific cleanup of RFLAGS.RF.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:42:39 +02:00
Nadav Amit
4467c3f1ad KVM: x86: Clear rflags.rf on emulated instructions
When an instruction is emulated RFLAGS.RF should be cleared. KVM previously did
not do so. This patch clears RFLAGS.RF after interception is done.  If a fault
occurs during the instruction, RFLAGS.RF will be set by a previous patch.  This
patch does not handle the case of traps/interrupts during rep-strings. Traps
are only expected to occur on debug watchpoints, and those are anyhow not
handled by the emulator.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:42:21 +02:00
Nadav Amit
163b135e7b KVM: x86: popf emulation should not change RF
RFLAGS.RF is always zero after popf. Therefore, popf should not updated RF, as
anyhow emulating popf, just as any other instruction should clear RFLAGS.RF.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:41:58 +02:00
Nadav Amit
bb663c7ada KVM: x86: Clearing rflags.rf upon skipped emulated instruction
When skipping an emulated instruction, rflags.rf should be cleared as it would
be on real x86 CPU.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-21 13:41:32 +02:00
Paolo Bonzini
ec10b72701 This series enables the "KVM_(S|G)ET_MP_STATE" ioctls on s390 to make
the cpu state settable by user space.
 
 This is necessary to avoid races in s390 SIGP/reset handling which
 happen because some SIGPs are handled in QEMU, while others are
 handled in the kernel. Together with the busy conditions as return
 value of SIGP races happen especially in areas like starting and
 stopping of CPUs. (For example, there is a program 'cpuplugd', that
 runs on several s390 distros which does automatic onlining and
 offlining on cpus.)
 
 As soon as the MPSTATE interface is used, user space takes complete
 control of the cpu states. Otherwise the kernel will use the old way.
 
 Therefore, the new kernel continues to work fine with old QEMUs.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJTxSp+AAoJEBF7vIC1phx8RYgP/0mBaV3isXrCR+iisLJYNJjq
 s6Ssl9TUMR3+lRZ9epytRd02UfkBGqVaW+HtRh5JP5KKAGSn2i3eB9WDAY7bD8i7
 DVLVE2aO7okw1Z2G6CEO27dRfS0SCAfj/X77BRISyqVxK4eY86lAhQdyU5nB67TR
 c0Fk4YHwjeBoQxZTAQr2xL4052gkB+Jp/PpltszILonsYNASOsxbcHqH4t+0SFmo
 FGXydBn6eN+e3fWQSxetkrxvj14sj5K6ljiZoSMyw5nDfyrRn8RcCX87GjNLG+GR
 X0eFB9Nl83NQoC5ksQtojunsx57+cEMgoWbdK7mxoqp+6+wJrvYB2eSKY77RYH4J
 2xIy3klF/ypSZt7gxwL0pugi9QodGW39mA+stuezKUwyPalpMxHmRRwvHitGJjkP
 KwvWc4m2QebKJ6RHhgkvZ0gMaVUJcqitrlXUxWgAAcH6MNBIC1g2ufsxnv51V/O6
 SnspBWTPVDUqO6bJP4brJiAt8K7Jx3Bg5frpyN0jparh8Nmu3Kwfz0RtDYrUYyOe
 p2o2lzY5L6gvY3iOrhvoc9zbpbyuycon8nUP4WOh/eGvIM2WV6cxmkck1Fo/wNso
 evunS1FNvbN7Wxk5h4/XSVsfdcM/mUa3E7cVxgpg8+Aqse9qfpM35BlNWR+zf0G+
 AdF90u/I+3mcRKWoSrKu
 =86qw
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140715' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next

This series enables the "KVM_(S|G)ET_MP_STATE" ioctls on s390 to make
the cpu state settable by user space.

This is necessary to avoid races in s390 SIGP/reset handling which
happen because some SIGPs are handled in QEMU, while others are
handled in the kernel. Together with the busy conditions as return
value of SIGP races happen especially in areas like starting and
stopping of CPUs. (For example, there is a program 'cpuplugd', that
runs on several s390 distros which does automatic onlining and
offlining on cpus.)

As soon as the MPSTATE interface is used, user space takes complete
control of the cpu states. Otherwise the kernel will use the old way.

Therefore, the new kernel continues to work fine with old QEMUs.
2014-07-21 13:35:43 +02:00
Christian Borntraeger
e59d120f96 KVM: s390: add ipte to trace event decoding
IPTE intercept can happen, let's decode that.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-07-21 13:22:47 +02:00
Cornelia Huck
78599d9004 KVM: s390: advertise KVM_CAP_S390_IRQCHIP
We should advertise all capabilities, including those that can
be enabled.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:47 +02:00
Cornelia Huck
8a366a4bae KVM: s390: document KVM_CAP_S390_IRQCHIP
Let's document that this is a capability that may be enabled per-vm.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:46 +02:00
Cornelia Huck
0907c855b3 KVM: document target of capability enablement
Capabilities can be enabled on a vcpu or (since recently) on a vm. Document
this and note for the existing capabilites whether they are per-vcpu or
per-vm.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:46 +02:00
David Hildenbrand
ea74c0ea1b KVM: s390: remove the tasklet used by the hrtimer
We can get rid of the tasklet used for waking up a VCPU in the hrtimer
code but wakeup the VCPU directly.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:42 +02:00
David Hildenbrand
0e9c85a5a3 KVM: s390: move vcpu wakeup code to a central point
Let's move the vcpu wakeup code to a central point.

We should set the vcpu->preempted flag only if the target is actually sleeping
and before the real wakeup happens. Otherwise the preempted flag might be set,
when not necessary. This may result in immediate reschedules after schedule()
in some scenarios.

The wakeup code doesn't require the local_int.lock to be held.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:38 +02:00
David Hildenbrand
433b9ee43c KVM: s390: remove _bh locking from start_stop_lock
The start_stop_lock is no longer acquired when in atomic context, therefore we
can convert it into an ordinary spin_lock.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:34 +02:00
David Hildenbrand
4ae3c0815f KVM: s390: remove _bh locking from local_int.lock
local_int.lock is not used in a bottom-half handler anymore, therefore we can
turn it into an ordinary spin_lock at all occurrences.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:28 +02:00
David Hildenbrand
0759d0681c KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block
This patch cleans up the code in handle_wait by reusing the common code
function kvm_vcpu_block.

signal_pending(), kvm_cpu_has_pending_timer() and kvm_arch_vcpu_runnable() are
sufficient for checking if we need to wake-up that VCPU. kvm_vcpu_block
uses these functions, so no checks are lost.

The flag "timer_due" can be removed - kvm_cpu_has_pending_timer() tests whether
the timer is pending, thus the vcpu is correctly woken up.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-07-21 13:22:16 +02:00
Wanpeng Li
963fee1656 KVM: nVMX: Fix virtual interrupt delivery injection
This patch fix bug reported in https://bugzilla.kernel.org/show_bug.cgi?id=73331,
after the patch http://www.spinics.net/lists/kvm/msg105230.html applied, there is
some progress and the L2 can boot up, however, slowly. The original idea of this
fix vid injection patch is from "Zhang, Yang Z" <yang.z.zhang@intel.com>.

Interrupt which delivered by vid should be injected to L1 by L0 if current is in
L1, or should be injected to L2 by L0 through the old injection way if L1 doesn't
have set External-interrupt exiting bit. The current logic doen't consider these
cases. This patch fix it by vid intr to L1 if current is L1 or L2 through old
injection way if L1 doen't have External-interrupt exiting bit set.

Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com>
Signed-off-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-17 14:45:26 +02:00
Nadav Amit
68efa764f3 KVM: x86: Emulator support for #UD on CPL>0
Certain instructions (e.g., mwait and monitor) cause a #UD exception when they
are executed in user mode. This is in contrast to the regular privileged
instructions which cause #GP. In order not to mess with SVM interception of
mwait and monitor which assumes privilege level assertions take place before
interception, a flag has been added.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:05 +02:00
Nadav Amit
10e38fc7ca KVM: x86: Emulator flag for instruction that only support 16-bit addresses in real mode
Certain instructions, such as monitor and xsave do not support big real mode
and cause a #GP exception if any of the accessed bytes effective address are
not within [0, 0xffff].  This patch introduces a flag to mark these
instructions, including the necassary checks.

Signed-off-by: Nadav Amit <namit@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:04 +02:00
Paolo Bonzini
44583cba91 KVM: x86: use kvm_read_guest_page for emulator accesses
Emulator accesses are always done a page at a time, either by the emulator
itself (for fetches) or because we need to query the MMU for address
translations.  Speed up these accesses by using kvm_read_guest_page
and, in the case of fetches, by inlining kvm_read_guest_virt_helper and
dropping the loop around kvm_read_guest_page.

This final tweak saves 30-100 more clock cycles (4-10%), bringing the
count (as measured by kvm-unit-tests) down to 720-1100 clock cycles on
a Sandy Bridge Xeon host, compared to 2300-3200 before the whole series
and 925-1700 after the first two low-hanging fruit changes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:04 +02:00
Paolo Bonzini
719d5a9b24 KVM: x86: ensure emulator fetches do not span multiple pages
When the CS base is not page-aligned, the linear address of the code could
get close to the page boundary (e.g. 0x...ffe) even if the EIP value is
not.  So we need to first linearize the address, and only then compute
the number of valid bytes that can be fetched.

This happens relatively often when executing real mode code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:04 +02:00
Paolo Bonzini
17052f16a5 KVM: emulate: put pointers in the fetch_cache
This simplifies the code a bit, especially the overflow checks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:03 +02:00
Paolo Bonzini
9506d57de3 KVM: emulate: avoid per-byte copying in instruction fetches
We do not need a memory copying loop anymore in insn_fetch; we
can use a byte-aligned pointer to access instruction fields directly
from the fetch_cache.  This eliminates 50-150 cycles (corresponding to
a 5-10% improvement in performance) from each instruction.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:03 +02:00
Paolo Bonzini
5cfc7e0f5e KVM: emulate: avoid repeated calls to do_insn_fetch_bytes
do_insn_fetch_bytes will only be called once in a given insn_fetch and
insn_fetch_arr, because in fact it will only be called at most twice
for any instruction and the first call is explicit in x86_decode_insn.
This observation lets us hoist the call out of the memory copying loop.
It does not buy performance, because most fetches are one byte long
anyway, but it prepares for the next patch.

The overflow check is tricky, but correct.  Because do_insn_fetch_bytes
has already been called once, we know that fc->end is at least 15.  So
it is okay to subtract the number of bytes we want to read.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:02 +02:00
Paolo Bonzini
285ca9e948 KVM: emulate: speed up do_insn_fetch
Hoist the common case up from do_insn_fetch_byte to do_insn_fetch,
and prime the fetch_cache in x86_decode_insn.  This helps a bit the
compiler and the branch predictor, but above all it lays the
ground for further changes in the next few patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:02 +02:00
Bandan Das
41061cdb98 KVM: emulate: do not initialize memopp
rip_relative is only set if decode_modrm runs, and if you have ModRM
you will also have a memopp.  We can then access memopp unconditionally.
Note that rip_relative cannot be hoisted up to decode_modrm, or you
break "mov $0, xyz(%rip)".

Also, move typecast on "out of range value" of mem.ea to decode_modrm.

Together, all these optimizations save about 50 cycles on each emulated
instructions (4-6%).

Signed-off-by: Bandan Das <bsd@redhat.com>
[Fix immediate operands with rip-relative addressing. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:01 +02:00
Bandan Das
573e80fe04 KVM: emulate: rework seg_override
x86_decode_insn already sets a default for seg_override,
so remove it from the zeroed area. Also replace set/get functions
with direct access to the field.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:01 +02:00
Bandan Das
c44b4c6ab8 KVM: emulate: clean up initializations in init_decode_cache
A lot of initializations are unnecessary as they get set to
appropriate values before actually being used. Optimize
placement of fields in x86_emulate_ctxt

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:00 +02:00
Bandan Das
02357bdc8c KVM: emulate: cleanup decode_modrm
Remove the if conditional - that will help us avoid
an "else initialize to 0" Also, rearrange operators
for slightly better code.

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:00 +02:00
Bandan Das
685bbf4ac4 KVM: emulate: Remove ctxt->intercept and ctxt->check_perm checks
The same information can be gleaned from ctxt->d and avoids having
to zero/NULL initialize intercept and check_perm

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:14:00 +02:00
Bandan Das
1498507a47 KVM: emulate: move init_decode_cache to emulate.c
Core emulator functions all belong in emulator.c,
x86 should have no knowledge of emulator internals

Signed-off-by: Bandan Das <bsd@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:59 +02:00
Paolo Bonzini
f5f87dfbc7 KVM: emulate: simplify writeback
The "if/return" checks are useless, because we return X86EMUL_CONTINUE
anyway if we do not return.

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:59 +02:00
Paolo Bonzini
54cfdb3e95 KVM: emulate: speed up emulated moves
We can just blindly move all 16 bytes of ctxt->src's value to ctxt->dst.
write_register_operand will take care of writing only the lower bytes.

Avoiding a call to memcpy (the compiler optimizes it out) gains about
200 cycles on kvm-unit-tests for register-to-register moves, and makes
them about as fast as arithmetic instructions.

We could perhaps get a larger speedup by moving all instructions _except_
moves out of x86_emulate_insn, removing opcode_len, and replacing the
switch statement with an inlined em_mov.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:58 +02:00
Paolo Bonzini
d40a6898e5 KVM: emulate: protect checks on ctxt->d by a common "if (unlikely())"
There are several checks for "peculiar" aspects of instructions in both
x86_decode_insn and x86_emulate_insn.  Group them together, and guard
them with a single "if" that lets the processor quickly skip them all.
Make this more effective by adding two more flag bits that say whether the
.intercept and .check_perm fields are valid.  We will reuse these
flags later to avoid initializing fields of the emulate_ctxt struct.

This skims about 30 cycles for each emulated instructions, which is
approximately a 3% improvement.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-07-11 09:13:58 +02:00