linux/arch/x86
David Matlack ee3d1570b5 kvm: fix potentially corrupt mmio cache
vcpu exits and memslot mutations can run concurrently as long as the
vcpu does not aquire the slots mutex. Thus it is theoretically possible
for memslots to change underneath a vcpu that is handling an exit.

If we increment the memslot generation number again after
synchronize_srcu_expedited(), vcpus can safely cache memslot generation
without maintaining a single rcu_dereference through an entire vm exit.
And much of the x86/kvm code does not maintain a single rcu_dereference
of the current memslots during each exit.

We can prevent the following case:

   vcpu (CPU 0)                             | thread (CPU 1)
--------------------------------------------+--------------------------
1  vm exit                                  |
2  srcu_read_unlock(&kvm->srcu)             |
3  decide to cache something based on       |
     old memslots                           |
4                                           | change memslots
                                            | (increments generation)
5                                           | synchronize_srcu(&kvm->srcu);
6  retrieve generation # from new memslots  |
7  tag cache with new memslot generation    |
8  srcu_read_unlock(&kvm->srcu)             |
...                                         |
   <action based on cache occurs even       |
    though the caching decision was based   |
    on the old memslots>                    |
...                                         |
   <action *continues* to occur until next  |
    memslot generation change, which may    |
    be never>                               |
                                            |

By incrementing the generation after synchronizing with kvm->srcu readers,
we ensure that the generation retrieved in (6) will become invalid soon
after (8).

Keeping the existing increment is not strictly necessary, but we
do keep it and just move it for consistency from update_memslots to
install_new_memslots.  It invalidates old cached MMIOs immediately,
instead of having to wait for the end of synchronize_srcu_expedited,
which makes the code more clearly correct in case CPU 1 is preempted
right after synchronize_srcu() returns.

To avoid halving the generation space in SPTEs, always presume that the
low bit of the generation is zero when reconstructing a generation number
out of an SPTE.  This effectively disables MMIO caching in SPTEs during
the call to synchronize_srcu_expedited.  Using the low bit this way is
somewhat like a seqcount---where the protected thing is a cache, and
instead of retrying we can simply punt if we observe the low bit to be 1.

Cc: stable@vger.kernel.org
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03 10:03:41 +02:00
..
boot * Enforce CONFIG_RELOCATABLE for the x86 EFI boot stub, otherwise 2014-08-11 13:58:54 -07:00
configs USB: remove CONFIG_USB_DEBUG from defconfig files 2014-05-28 09:40:45 -07:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2014-08-04 09:52:51 -07:00
ia32 x86, vdso: Reimplement vdso.so preparation in build-time C 2014-05-05 13:18:51 -07:00
include KVM: remove garbage arg to *hardware_{en,dis}able 2014-08-29 16:35:55 +02:00
kernel PCI changes for the v3.17 merge window (part 2): 2014-08-14 18:10:33 -06:00
kvm kvm: fix potentially corrupt mmio cache 2014-09-03 10:03:41 +02:00
lguest asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
lib Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:18:49 -07:00
math-emu asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
mm memory-hotplug: x86_32: suitable memory should go to ZONE_MOVABLE 2014-08-06 18:01:21 -07:00
net net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
oprofile x86, oprofile, nmi: Fix CPU hotplug callback registration 2014-03-20 13:43:43 +01:00
pci Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-13 18:23:32 -06:00
platform Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-13 18:23:32 -06:00
power x86, power, suspend: Annotate restore_processor_state() with notrace 2014-07-17 09:45:05 -04:00
purgatory kexec: support for kexec on panic using new system call 2014-08-08 15:57:33 -07:00
realmode x86/build: Supress realmode.bin is up to date message 2014-04-16 15:17:24 +02:00
syscalls kexec: new syscall kexec_file_load() declaration 2014-08-08 15:57:32 -07:00
tools x86/build: Supress "Nothing to be done for ..." messages 2014-04-14 11:44:36 +02:00
um Merge branch 'signal-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc 2014-08-09 09:58:12 -07:00
vdso arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area 2014-08-08 15:57:27 -07:00
video
xen x86/xen: use vmap() to map grant table pages in PVH guests 2014-08-11 11:59:35 +01:00
.gitignore
Kbuild purgatory: core purgatory functionality 2014-08-08 15:57:32 -07:00
Kconfig Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-13 18:23:32 -06:00
Kconfig.cpu Merge branch 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 13:15:58 -07:00
Kconfig.debug x86/efi: Dump the EFI page table 2014-03-04 16:17:17 +00:00
Makefile purgatory: core purgatory functionality 2014-08-08 15:57:32 -07:00
Makefile_32.cpu
Makefile.um