linux/arch/x86/kvm
David Matlack ee3d1570b5 kvm: fix potentially corrupt mmio cache
vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
   <action based on cache occurs even       |
    though the caching decision was based   |
    on the old memslots>                    |
...                                         |
   <action *continues* to occur until next  |
    memslot generation change, which may    |
    be never>                               |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03 10:03:41 +02:00
..
cpuid.c KVM: x86: Replace X86_FEATURE_NX offset with the definition 2014-08-21 13:50:23 +02:00
cpuid.h KVM: x86: DR6/7.RTM cannot be written 2014-07-21 17:17:52 +02:00
emulate.c KVM: x86: remove Aligned bit from movntps/movntpd 2014-08-29 14:57:59 +02:00
i8254.c KVM: x86: limit PIT timer frequency 2014-01-15 12:43:54 +01:00
i8254.h KVM: fold kvm_pit_timer into kvm_kpit_state 2012-08-01 00:21:07 -03:00
i8259.c KVM: inject ExtINT interrupt before APIC interrupts 2012-12-13 23:05:21 -02:00
irq.c KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use 2014-08-05 15:00:24 +02:00
irq.h KVM: switch to symbolic name for irq_states size 2012-07-20 16:12:16 -03:00
Kconfig KVM: Give IRQFD its own separate enabling Kconfig option 2014-08-05 14:26:28 +02:00
kvm_cache_regs.h KVM: MMU: Do not unconditionally read PDPTE from guest memory 2011-09-25 19:18:01 +03:00
lapic.c KVM: x86: recalculate_apic_map after enabling apic 2014-08-19 15:12:29 +02:00
lapic.h KVM: x86: Validate guest writes to MSR_IA32_APICBASE 2014-01-27 14:39:44 +01:00
Makefile kvm: Add VFIO device 2013-10-30 19:02:03 +01:00
mmu_audit.c arch/x86: replace strict_strto calls 2014-08-08 15:57:28 -07:00
mmu.c kvm: fix potentially corrupt mmio cache 2014-09-03 10:03:41 +02:00
mmu.h KVM: MMU: flush tlb out of mmu lock when write-protect the sptes 2014-04-23 17:49:52 -03:00
mmutrace.h x86/kvm: Resolve shadow warnings in macro expansion 2014-07-31 16:33:29 +02:00
paging_tmpl.h Revert "KVM: Simplify kvm->tlbs_dirty handling" 2014-04-23 17:49:48 -03:00
pmu.c KVM: x86: Clarify PMU related features bit manipulation 2014-08-20 13:01:25 +02:00
svm.c KVM: remove garbage arg to *hardware_{en,dis}able 2014-08-29 16:35:55 +02:00
trace.h kvm: x86: fix tracing for 32-bit 2014-08-25 16:08:21 +02:00
tss.h KVM: x86: hardware task switching support 2008-04-27 12:00:39 +03:00
vmx.c KVM: remove garbage arg to *hardware_{en,dis}able 2014-08-29 16:35:55 +02:00
x86.c KVM: x86: use guest maxphyaddr to check MTRR values 2014-08-29 18:56:24 +02:00
x86.h KVM: vmx: vmx instructions handling does not consider cs.l 2014-06-19 12:52:15 +02:00