No paravirt version of set_pmd_at/pmd_update/pmd_update_defer.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paravirt ops pmd_update/pmd_update_defer/pmd_set_at. Not all might be
necessary (vmware needs pmd_update, Xen needs set_pmd_at, nobody needs
pmd_update_defer), but this is to keep full simmetry with pte paravirt
ops, which looks cleaner and simpler from a common code POV.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Used by paravirt and not paravirt set_pmd_at.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alter compound get_page/put_page to keep references on subpages too, in
order to allow __split_huge_page_refcount to split an hugepage even while
subpages have been pinned by one of the get_user_pages() variants.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Four architectures (arm, mips, sparc, x86) use __vmalloc_area() for
module_init(). Much of the code is duplicated and can be generalized in a
globally accessible function, __vmalloc_node_range().
__vmalloc_node() now calls into __vmalloc_node_range() with a range of
[VMALLOC_START, VMALLOC_END) for functionally equivalent behavior.
Each architecture may then use __vmalloc_node_range() directly to remove
the duplication of code.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/avr32-2.6:
avr32: update default configuration files for Atmel boards
avr32: Convert to clocksource_register_hz
avr32: make architecture sys_clone prototype match asm-generic prototype
avr32: use syscall prototypes from asm-generic instead of arch
avr32: disable kprobes for all default configurations
avr32: boards: setup: use IS_ERR() instead of NULL check
This patch adjusts some values to make the default configuration for Atmel
boards more similar, and adds missing values to enable required functions. Also
remove defined symbols for functions not in use.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
This converts the avr32 clocksource to use clocksource_register_hz.
This is untested, so any assistance in testing would be appreciated!
CC: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
This patch will fix the arguments to the architecture sys_clone() function to
match the asm-generic/syscalls.h prototype. In the same go remove the
architecture specific prototype for the same function.
The sys_clone() function is only called from assembly, hence the argument types
were not having any affect.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
This patch removes the redundant syscalls prototypes in the architecture
specific syscalls.h header file. These were identical with the ones in
asm-generic/syscalls.h.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Reported-by: Peter Huewe <PeterHuewe@gmx.de>
Reported-by: Sven Schnelle <svens@stackframe.org>
Cc: stable <stable@kernel.org>
This patch will disable kprobes for all the default AVR32 board configurations.
This works around a regression in kprobes which seems to be related to AVR32 is
now lacking the struct kprobe_ctlblk.
Signed-off-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
* 'rmobile-latest' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
ARM: mach-shmobile: Kill off unused !gpio_is_valid() case
ARM: mach-shmobile: sh7372 Enable SDIO IRQs for Mackerel
ARM: mach-shmobile: sh7377 Enable SDIO IRQs
ARM: mach-shmobile: sh7367 Enable SDIO IRQs
ARM: mach-shmobile: sh7372 Enable SDIO IRQs
ARM: mach-shmobile: mackerel: Add touchscreen ST1232 support
ARM: mach-shmobile: ap4eb: SCIF port for earlyprintk when using zboot
ARM: mach-shmobile: mackerel: SCIF port for earlyprintk when using zboot
ARM: mach-shmobile: mackerel: Add support get_cd in CN23
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6: (29 commits)
video: move SH_MIPI_DSI/SH_LCD_MIPI_DSI to the top of menu
fbdev: Implement simple blanking in pseudocolor modes for vt8500lcdfb
video: imx: Update the manufacturer's name
nuc900fb: don't treat NULL clk as an error
s3c2410fb: don't treat NULL clk as an error
video: tidy up modedb formatting.
video: matroxfb: Correct video option in comments and kernel config help.
fbdev: sh_mobile_hdmi: simplify pointer handling
fbdev: sh_mobile_hdmi: framebuffer notifiers have to be registered
fbdev: sh_mobile_hdmi: add command line option to use the preferred EDID mode
OMAP: DSS2: Introduce omap_channel as an omap_dss_device parameter, add new overlay manager.
OMAP: DSS2: Use dss_features to handle DISPC bits removed on OMAP4
OMAP: DSS2: LCD2 Channel Changes for DISPC
OMAP: DSS2: Change remaining DISPC functions for new omap_channel argument
OMAP: DSS2: Introduce omap_channel argument to DISPC functions used by interface drivers
OMAP: DSS2: Represent DISPC register defines with channel as parameter
OMAP: DSS2: Add dss_features for omap4 and overlay manager related features
OMAP: DSS2: Clean up DISPC color mode validation checks
OMAP: DSS2: Add back authors of panel-generic.c based drivers
OMAP: DSS2: remove generic DPI panel driver duplicated panel drivers
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (41 commits)
fs: add documentation on fallocate hole punching
Gfs2: fail if we try to use hole punch
Btrfs: fail if we try to use hole punch
Ext4: fail if we try to use hole punch
Ocfs2: handle hole punching via fallocate properly
XFS: handle hole punching via fallocate properly
fs: add hole punching to fallocate
vfs: pass struct file to do_truncate on O_TRUNC opens (try #2)
fix signedness mess in rw_verify_area() on 64bit architectures
fs: fix kernel-doc for dcache::prepend_path
fs: fix kernel-doc for dcache::d_validate
sanitize ecryptfs ->mount()
switch afs
move internal-only parts of ncpfs headers to fs/ncpfs
switch ncpfs
switch 9p
pass default dentry_operations to mount_pseudo()
switch hostfs
switch affs
switch configfs
...
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
watchdog: Add MCF548x watchdog driver.
watchdog: add driver for the Atheros AR71XX/AR724X/AR913X SoCs
watchdog: Add TCO support for nVidia chipsets
watchdog: Add support for sp5100 chipset TCO
watchdog: f71808e_wdt: add F71862FG, F71869 to Kconfig
watchdog: iTCO_wdt: TCO Watchdog patch for Intel DH89xxCC PCH
watchdog: iTCO_wdt: TCO Watchdog patch for Intel NM10 DeviceIDs
watchdog: ks8695_wdt: include mach/hardware.h instead of mach/timex.h.
watchdog: Propagate Book E WDT period changes to all cores
watchdog: add CONFIG_WATCHDOG_NOWAYOUT support to PowerPC Book-E watchdog driver
watchdog: alim7101_wdt: fix compiler warning on alim7101_pci_tbl
watchdog: alim1535_wdt: fix compiler warning on ali_pci_tbl
watchdog: Fix reboot on W83627ehf chipset.
watchdog: Add watchdog support for W83627DHG chip
watchdog: f71808e_wdt: Add Fintek F71869 watchdog
watchdog: add f71862fg support
watchdog: clean-up f71808e_wdt.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (45 commits)
regulator: missing index in PTR_ERR() in isl6271a_probe()
regulator: Assign return value of mc13xxx_reg_rmw to ret
regulator: Add initial per-regulator debugfs support
regulator: Make regulator_has_full_constraints a bool
regulator: Clean up logging a bit
regulator: Optimise out noop voltage changes
regulator: Add API to re-apply voltage to hardware
regulator: Staticise non-exported functions in mc13892
regulator: Only notify voltage changes when they succeed
regulator: Provide a selector based set_voltage_sel() operation
regulator: Factor out voltage set operation into a separate function
regulator: Convert WM8994 to use get_voltage_sel()
regulator: Convert WM835x to use get_voltage_sel()
regulator: Allow modular build of mc13xxx-core
regulator: support PMIC mc13892
make mc13783 regulator code generic
Change the register name definitions for mc13783
mach-ux500: Updated and connected ab8500 regulator board configuration
regulators: Removed macros for initialization of ab8500 regulators
regulators: Added verbose debug messages to ab8500 regulators
...
* 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, olpc: Speed up device tree creation during boot
x86, olpc: Add OLPC device-tree support
x86, of: Define irq functions to allow drivers/of/* to build on x86
* 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
KVM: Initialize fpu state in preemptible context
KVM: VMX: when entering real mode align segment base to 16 bytes
KVM: MMU: handle 'map_writable' in set_spte() function
KVM: MMU: audit: allow audit more guests at the same time
KVM: Fetch guest cr3 from hardware on demand
KVM: Replace reads of vcpu->arch.cr3 by an accessor
KVM: MMU: only write protect mappings at pagetable level
KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
KVM: MMU: Initialize base_role for tdp mmus
KVM: VMX: Optimize atomic EFER load
KVM: VMX: Add definitions for more vm entry/exit control bits
KVM: SVM: copy instruction bytes from VMCB
KVM: SVM: implement enhanced INVLPG intercept
KVM: SVM: enhance mov DR intercept handler
KVM: SVM: enhance MOV CR intercept handler
KVM: SVM: add new SVM feature bit names
KVM: cleanup emulate_instruction
KVM: move complete_insn_gp() into x86.c
KVM: x86: fix CR8 handling
KVM guest: Fix kvm clock initialization when it's configured out
...
This integrates the XZ decompression code to the x86 pre-boot code.
mkpiggy.c is updated to reserve about 32 KiB more buffer safety margin for
kernel decompression. It is done unconditionally for all decompressors to
keep the code simpler.
The XZ decompressor needs around 30 KiB of heap, so the heap size is
increased to 32 KiB on both x86-32 and x86-64.
Documentation/x86/boot.txt is updated to list the XZ magic number.
With the x86 BCJ filter in XZ, XZ-compressed x86 kernel tends to be a few
percent smaller than the equivalent LZMA-compressed kernel.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alain Knaff <alain@knaff.lu>
Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In fsl_rio_dbell_handler() the code currently simply acknowledges the QFI
queue full interrupt, but does nothing to resolve the queue full
condition. Instead, it jumps to the end of the isr. When a queue full
condition occurs, the isr is then re-entered immediately and continually,
forever.
The fix is to just fall through and read out current doorbell entries.
Signed-off-by: Thomas Taranowski <tom@baringforge.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drop the old geode_gpio crud, as well as the raw outl() calls; instead,
use the Linux GPIO API where possible, and the cs5535_gpio API in other
places.
Note that we don't actually clean up the driver properly yet (once loaded,
it always remains loaded). That'll come later..
This patch is necessary for building the driver.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For arch which needs USE_GENERIC_SMP_HELPERS, it has to select
USE_GENERIC_SMP_HELPERS, rather than leaving a choice to user, since they
don't provide their own implementions.
Also, move on_each_cpu() to kernel/smp.c, it is strange to put it in
kernel/softirq.c.
For arch which doesn't use USE_GENERIC_SMP_HELPERS, e.g. blackfin, only
on_each_cpu() is compiled.
Signed-off-by: Amerigo Wang <amwang@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Occasionally the system gets into a state where the CMOS clock has gotten
slightly ahead of current time and the periodic update of RTC fails. The
message is a nuisance and repeats spamming the log.
See: http://www.ntp.org/ntpfaq/NTP-s-trbl-spec.htm#Q-LINUX-SET-RTC-MMSS
Rather than just removing the message, make it show only once and reduce
severity since it indicates a normal and non urgent condition.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify write file operation for mmapper by using
simple_write_to_buffer().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
unregister_winch() should use list_for_each_safe(), as it can delete from
the list.
Signed-off-by: Will Newton <will.newton@gmail.com>
Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently CONFIG_HIGHMEM is broken on User Mode Linux. I'm not sure if it
worked ever.
It doesn't compile and this breaks randomconfig testing.
Signed-off-by: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 6aa85a5ae6 (omap4: 4430sdp:
enable the ehci port on 4430SDP) added code to enable EHCI
support on 4430sdp board.
Looks like the ULPI pin does not seem to be muxed properly on ES1.0
SDP and this causes the system to reboot when the ULPI PHY is
enabled.
Fix this by muxing the pin, this is the same setting for
both ES1.0 and ES2.0. Also add checking for gpio_request.
Cc: Keshava Munegowda <keshava_mgowda@ti.com
Signed-off-by: Tony Lindgren <tony@atomide.com>
This patch is adding support for pwm1 and pwm2 devices found
on mx51.
[ this patch has been tested with pwm-backlight driver ]
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Commit 076762aa52 is adding a macro whis is
calling imx_add_mxc_pwm() but gives it 2 parameters while it's taking only
one parameters.
Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
This adds preliminary support for the alpha project AP-SH4AD-0A reference
platform (SH7786 based).
Additional platform information available at:
http://www.apnet.co.jp/product/superh/ap-sh4ad-0a.html
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This adds preliminary support for the alpha project AP-SH4A-3A reference
platform (SH7785 based).
Additional paltform information available at:
http://www.apnet.co.jp/product/superh/ap-sh4a-3a.html
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
The Card Detect GPIOs used on AP4EVB and Mackerel are
alwayws valid, so kill off the unused !gpio_is_valid()
case.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Add probe support for the sh7372 SH4AL-DSP core.
The most common use case for this is when the system
boots from the ARM core in the sh7372 and uses the
SH core for application offload as a slave CPU.
May also be used to boot the sh7372 from the SH core.
Signed-off-by: Magnus Damm <damm@opensource.se>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
This patch adds support System MMU for S5PV310 and S5PC210.
Signed-off-by: Donguk Ryu <du.ryu@samsung.com>
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
[kgene.kim@samsung.com: changed SYSMMU config name]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
This patch adds support System MMU which supports address transition
from virtual address to physical address. Basically, each hardware
block is connected System MMU block can use directly vitrual address
when it accesses physical memory not using physical address.
Signed-off-by: Donguk Ryu <du.ryu@samsung.com>
Signed-off-by: Sangbeom Kim <sbkim73@samsung.com>
[kgene.kim@samsung.com: removed useless codes]
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Fix the warnings genereted by arch/powerpc/include/asm/immap_qe.h when
CONFIG_PHYS_ADDR_T_64BIT is defined:
immap_qe.h: In function 'immrbar_virt_to_phys':
immap_qe.h:472:8: warning: cast from pointer to integer of different size
immap_qe.h:472:24: warning: cast from pointer to integer of different size
immap_qe.h:473:5: warning: cast from pointer to integer of different size
immap_qe.h:473:21: warning: cast from pointer to integer of different size
immap_qe.h:474:36: warning: cast from pointer to integer of different size
Note that the QE does not support 36-bit physical addresses, so even when
CONFIG_PHYS_ADDR_T_64BIT is defined, the QE MURAM must be located below the
4GB boundary.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
In order to prevent the fsl_dma driver from claiming the DMA channels that the
P1022DS audio driver needs, the compatible properties for those nodes must say
"fsl,ssi-dma-channel" instead of "fsl,eloplus-dma-channel".
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
MPC8308 has ULPI pin muxing settings in SICRH register, bits 17-18
which is different from both MPC8313 and MPC8315.
Also MPC8308 doesn't have REFSEL, UTMI_PHY_EN and OTG_PORT fields
in the USB DR controller CONTROL register.
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Tested-by: Wolfgang Denk <wd@denx.de>
Acked-by: Wolfgang Denk <wd@denx.de>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Moved setting of RFXE bit so we get machine checks on RIO errors into
cpu_setup so that the RIO code isn't core specific.
Signed-off-by: Shaohui Xie <b21989@freescale.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Roy Zang <tie-fei.zang@freescale.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Also make 74xx HID1 definition conditional.
Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Shaohui Xie <b21989@freescale.com>
Cc: Roy Zang <tie-fei.zang@freescale.com>
Cc: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
events -> this patch fixes it and makes cpu_idle events throwing less complex.
It also introduces cpu_idle events for all architectures which use
the cpuidle subsystem, namely:
- arch/arm/mach-at91/cpuidle.c
- arch/arm/mach-davinci/cpuidle.c
- arch/arm/mach-kirkwood/cpuidle.c
- arch/arm/mach-omap2/cpuidle34xx.c
- arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
- arch/x86/kernel/process.c (did throw events before, but was a mess)
- drivers/idle/intel_idle.c (did throw events before)
Convention should be:
Fire cpu_idle events inside the current pm_idle function (not somewhere
down the the callee tree) to keep things easy.
Current possible pm_idle functions in X86:
c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
-> this is really easy is now.
This affects userspace:
The type field of the cpu_idle power event can now direclty get
mapped to:
/sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
instead of throwing very CPU/mwait specific values.
This change is not visible for the intel_idle driver.
For the acpi_idle driver it should only be visible if the vendor
misses out C-states in his BIOS.
Another (perf timechart) patch reads out cpuidle info of cpu_idle
events from:
/sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
to the correct C-/cpuidle state again, even if e.g. vendors miss
out C-states in their BIOS and for example only export C1 and C3.
-> everything is fine.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Robert Schoene <robert.schoene@tu-dresden.de>
CC: Jean Pihet <j-pihet@ti.com>
CC: Arjan van de Ven <arjan@linux.intel.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-pm@lists.linux-foundation.org
CC: linux-acpi@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-perf-users@vger.kernel.org
CC: linux-omap@vger.kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
The kerneldoc for this function is at odds with the DMA-API
document, which holds, so fix it.
Signed-off-by: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
arch/ia64/kernel/acpi.c:481: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long unsigned int’
Introduced by commit 05f2f274c8
[IA64] Avoid array overflow if there are too many cpus in SRAT table
Signed-off-by: Tony Luck <tony.luck@intel.com>
Having four variables for the same thing:
idle_halt, idle_nomwait, force_mwait and boot_option_idle_overrides
is rather confusing and unnecessary complex.
if idle= boot param is passed, only set up one variable:
boot_option_idle_overrides
Introduces following functional changes/fixes:
- intel_idle driver does not register if any idle=xy
boot param is passed.
- processor_idle.c will also not register a cpuidle driver
and get active if idle=halt is passed.
Before a cpuidle driver with one (C1, halt) state got registered
Now the default_idle function will be used which finally uses
the same idle call to enter sleep state (safe_halt()), but
without registering a whole cpuidle driver.
That means idle= param will always avoid cpuidle drivers to register
with one exception (same behavior as before):
idle=nomwait
may still register acpi_idle cpuidle driver, but C1 will not use
mwait, but hlt. This can be a workaround for IO based deeper sleep
states where C1 mwait causes problems.
Signed-off-by: Thomas Renninger <trenn@suse.de>
cc: x86@kernel.org
Signed-off-by: Len Brown <len.brown@intel.com>
To make mc13783 and mc13892 share code, the register names should be
changed to fit the new macro definitions in the comming patch.
Signed-off-by: Yong Shen <yong.shen@linaro.org>
Acked-by: Sascha Hauer <s.hauer@pengutronix.de>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
The ab8500 regulator board configuration is updated and put in an
array which can easily be used in the MFD board configuration. The
regulator board configuration is also added to the MFD
configuration in this patch.
Signed-off-by: Bengt Jonsson <bengt.g.jonsson@stericsson.com>
Acked-by: Linus Walleij <linus.walleij@stericsson.com>
Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
The CLZ instruction does not alter the condition flags, so remove the
"cc" clobber from the inline asm for fls().
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When CONFIG_CMDLINE_FORCE is used, the warning
Ignoring unrecognised tag 0x54410009
was displayed. Change this to
Ignoring tag cmdline (using the default kernel command line)
Signed-off-by: Alexander Holler <holler@ahsoftware.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add support for dynamical allocation of imx-keypad on mx5 platform.
After moving to dynamically registration of the keypad, the keypad clock
name needs to change accordingly.
The reason is that the original mx5 keypad platform_device id was 0,
now we use id=-1 as per arch/arm/plat-mxc/devices/platform-imx-keypad.c.
Tested keypad successfully on a MX51_3DS board.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
The mxs duart is actually an amba-pl011 device. This commit changes
the duart device code to dynamically allocate amba-pl011 device,
so that drivers/serial/amba-pl011.c can be used on mxs.
Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
init_fpu() (which is indirectly called by the fpu switching code) assumes
it is in process context. Rather than makeing init_fpu() use an atomic
allocation, which can cause a task to be killed, make sure the fpu is
already initialized when we enter the run loop.
KVM-Stable-Tag.
Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Move the operation of 'writable' to set_spte() to clean up code
[avi: remove unneeded booleanification]
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
It only allows to audit one guest in the system since:
- 'audit_point' is a glob variable
- mmu_audit_disable() is called in kvm_mmu_destroy(), so audit is disabled
after a guest exited
this patch fix those issues then allow to audit more guests at the same time
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Instead of syncing the guest cr3 every exit, which is expensince on vmx
with ept enabled, sync it only on demand.
[sheng: fix incorrect cr3 seen by Windows XP]
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If a pagetable contains a writeable large spte, all of its sptes will be
write protected, including non-leaf ones, leading to endless pagefaults.
Do not write protect pages above PT_PAGE_TABLE_LEVEL, as the spte fault
paths assume non-leaf sptes are writable.
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
'error' is byte sized, so use a byte register constraint.
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When NX is enabled on the host but not on the guest, we use the entry/exit
msr load facility, which is slow. Optimize it to use entry/exit efer load,
which is ~1200 cycles faster.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In case of a nested page fault or an intercepted #PF newer SVM
implementations provide a copy of the faulting instruction bytes
in the VMCB.
Use these bytes to feed the instruction emulator and avoid the costly
guest instruction fetch in this case.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When the DecodeAssist feature is available, the linear address
is provided in the VMCB on INVLPG intercepts. Use it directly to
avoid any decoding and emulation.
This is only useful for shadow paging, though.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Newer SVM implementations provide the GPR number in the VMCB, so
that the emulation path is no longer necesarry to handle debug
register access intercepts. Implement the handling in svm.c and
use it when the info is provided.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Newer SVM implementations provide the GPR number in the VMCB, so
that the emulation path is no longer necesarry to handle CR
register access intercepts. Implement the handling in svm.c and
use it when the info is provided.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
the recent APM Vol.2 and the recent AMD CPUID specification describe
new CPUID features bits for SVM. Name them here for later usage.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
emulate_instruction had many callers, but only one used all
parameters. One parameter was unused, another one is now
hidden by a wrapper function (required for a future addition
anyway), so most callers use now a shorter parameter list.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
move the complete_insn_gp() helper function out of the VMX part
into the generic x86 part to make it usable by SVM.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The handling of CR8 writes in KVM is currently somewhat cumbersome.
This patch makes it look like the other CR register handlers
and fixes a possible issue in VMX, where the RIP would be incremented
despite an injected #GP.
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In KVM_CREATE_IRQCHIP, kvm_io_bus_unregister_dev() is called without taking
slots_lock in the error handling path.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
If KVM sees a read-only host page, it will map it as read-only to prevent
breaking a COW. However, if the page was part of a large guest page, KVM
incorrectly extends the write protection to the entire large page frame
instead of limiting it to the normal host page.
This results in the instantiation of a new shadow page with read-only access.
If this happens for a MOVS instruction that moves memory between two normal
pages, within a single large page frame, and mapped within the guest as a
large page, and if, in addition, the source operand is not writeable in the
host (perhaps due to KSM), then KVM will instantiate a read-only direct
shadow page, instantiate an spte for the source operand, then instantiate
a new read/write direct shadow page and instantiate an spte for the
destination operand. Since these two sptes are in different shadow pages,
MOVS will never see them at the same time and the guest will not make
progress.
Fix by mapping the direct shadow page read/write, and only marking the
host page read-only.
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the xsetbv intercept to the AMD part
of KVM. This makes AVX usable in a save way for the guest on
AVX capable AMD hardware.
The patch is tested by using AVX in the guest and host in
parallel and checking for data corruption. I also used the
KVM xsave unit-tests and they all pass.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Large page information has two elements but one of them, write_count, alone
is accessed by a helper function.
This patch replaces this helper function with more generic one which returns
newly named kvm_lpage_info structure and use it to access the other element
rmap_pde.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
In certain use-cases, we want to allocate guests fixed time slices where idle
guest cycles leave the machine idling. There are many approaches to achieve
this but the most direct is to simply avoid trapping the HLT instruction which
lets the guest directly execute the instruction putting the processor to sleep.
Introduce this as a module-level option for kvm-vmx.ko since if you do this
for one guest, you probably want to do it for all.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the new flush-by-asid of upcoming AMD
processors to the KVM-AMD module.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch replaces all calls to force_new_asid which are
intended to flush the guest-tlb by the more appropriate
function svm_flush_tlb. As a side-effect the force_new_asid
function is removed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This function is unused and there is svm_flush_tlb which
does the same. So this function can be removed.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Retry #PF for softmmu only when the current vcpu has the same cr3 as the time
when #PF occurs
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Retry #PF is the speculative path, so don't set the accessed bit
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
It's the speculative path if 'no_apf = 1' and we will specially handle this
speculative path in the later patch, so 'prefault' is better to fit the sense.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit for all LBR related
state. This includes the debugctl, br_from, br_to,
last_excp_from, and last_excp_to msrs.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit for the cr2 register in
the vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit defined for the cs, ds,
ss, an es segemnts and the current cpl saved in the vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit for the base and limit
of the gdt and idt in the vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit for the dr6 and dr7
debug registers in the vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the CRx clean-bit for the vmcb. This
bit covers cr0, cr3, cr4, and efer.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit for all nested paging
related state in the vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit for all interrupt
related state in the vmcb. This corresponds to vmcb offset
0x60-0x67.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch implements the clean-bit for the asid in the
vmcb.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the clean bit for the physical addresses of
the MSRPM and the IOPM. It does not need to be set in the
code because the only place where these values are changed
is the nested-svm vmrun and vmexit path. These functions
already mark the complete VMCB as dirty.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the clean-bit for intercepts-vectors, the
TSC offset and the pause-filter count to the appropriate
places. The IO and MSR permission bitmaps are not subject to
this bit.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch adds the infrastructure for the implementation of
the individual clean-bits.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
One more "KVM: MMU: Don't drop accessed bit while updating an spte."
Sptes are accessed by both kvm and hardware.
This patch uses update_spte() to fix the way of removing write access.
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
If we execute VMREAD during reboot we'll just skip over it. Instead of
returning garbage, return 0, which has a much smaller chance of confusing
the code. Otherwise we risk a flood of debug printk()s which block the
reboot process if a serial console or netconsole is enabled.
Signed-off-by: Avi Kivity <avi@redhat.com>
Since vmx blocks INIT signals, we disable virtualization extensions during
reboot. This leads to virtualization instructions faulting; we trap these
faults and spin while the reboot continues.
Unfortunately spinning on a non-preemptible kernel may block a task that
reboot depends on; this causes the reboot to hang.
Fix by skipping over the instruction and hoping for the best.
Signed-off-by: Avi Kivity <avi@redhat.com>
This patch replaces the open-coded vmcb-selection for the
TSC calculation with the new get_host_vmcb helper function
introduced in this patchset.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch wraps changes to the misc intercepts of SVM
into seperate functions to abstract nested-svm better and
prepare the implementation of the vmcb-clean-bits feature.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch wraps changes to the exception intercepts of SVM
into seperate functions to abstract nested-svm better and
prepare the implementation of the vmcb-clean-bits feature.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch wraps changes to the DRx intercepts of SVM into
seperate functions to abstract nested-svm better and prepare
the implementation of the vmcb-clean-bits feature.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch wraps changes to the CRx intercepts of SVM into
seperate functions to abstract nested-svm better and prepare
the implementation of the vmcb-clean-bits feature.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch adds a function to recalculate the effective
intercepts masks when the vcpu is in guest-mode and either
the host or the guest intercept masks change.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch prevents that emulation failures which result
from emulating an instruction for an L2-Guest results in
being reported to userspace.
Without this patch a malicious L2-Guest would be able to
kill the L1 by triggering a race-condition between an vmexit
and the instruction emulator.
With this patch the L2 will most likely only kill itself in
this situation.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch replaces the is_nested logic in the SVM module
with the generic notion of guest-mode.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This patch introduces a generic representation of guest-mode
fpr a vcpu. This currently only exists in the SVM code.
Having this representation generic will help making the
non-svm code aware of nesting when this is necessary.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently page fault cr2 and nesting infomation are carried outside
the fault data structure. Instead they are placed in the vcpu struct,
which results in confusion as global variables are manipulated instead
of passing parameters.
Fix this issue by adding address and nested fields to struct x86_exception,
so this struct can carry all information associated with a fault.
Signed-off-by: Avi Kivity <avi@redhat.com>
Tested-by: Joerg Roedel <joerg.roedel@amd.com>
Tested-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Immediately after we generate an exception, we want a X86EMUL_PROPAGATE_FAULT
constant, so return it from the generation functions.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of checking for X86EMUL_PROPAGATE_FAULT, check for any error,
making the callers more reliable.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If rc == X86EMUL_PROPAGATE_FAULT, we would have returned earlier.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Introduce a structure that can contain an exception to be passed back
to main kvm code.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit somewhere
| that is cleareded when we flush the tlb. kvm_mmu_notifier_invalidate_page()
| can consult the bit and force a flush if set.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Introduce a common function to map invalid gpte
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Remove it since we can judge it by using sp->unsync
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Rename it to fit its sense better
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We just need flush tlb if overwrite a writable spte with a read-only one.
And we should move this operation to set_spte() for sync_page path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
We should flush all tlbs after drop spte on sync_page path since
Quote from Avi:
| sync_page
| drop_spte
| kvm_mmu_notifier_invalidate_page
| kvm_unmap_rmapp
| spte doesn't exist -> no flush
| page is freed
| guest can write into freed page?
KVM-Stable-Tag.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
The exit reason alone is insufficient to understand exactly why an exit
occured; add ISA-specific trace parameters for additional information.
Because fetching these parameters is expensive on vmx, and because these
parameters are fetched even if tracing is disabled, we fetch the
parameters via a callback instead of as traditional trace arguments.
Signed-off-by: Avi Kivity <avi@redhat.com>
exit_reason's meaning depend on the instruction set; record it so a trace
taken on one machine can be interpreted on another.
Signed-off-by: Avi Kivity <avi@redhat.com>
cea15c2 ("KVM: Move KVM context switch into own function") split vmx_vcpu_run()
to prevent multiple copies of the context switch from being generated (causing
problems due to a label). This patch folds them back together again and adds
the __noclone attribute to prevent the label from being duplicated.
Signed-off-by: Avi Kivity <avi@redhat.com>
Linear addresses are supposed to already have segment checks performed on them;
if we play with these addresses the checks become invalid.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently the x86 emulator converts the segment register associated with
an operand into a segment base which is added into the operand address.
This loss of information results in us not doing segment limit checks properly.
Replace struct operand's addr.mem field by a segmented_address structure
which holds both the effetive address and segment. This will allow us to
do the limit check at the point of access.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Failed emulation is reported via a tracepoint; the cmps printk is pointless.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Inform user to either disable TXT in the BIOS or do TXT launch
with tboot before enabling KVM since some BIOSes do not set
FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled.
Signed-off-by: Shane Wang <shane.wang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If reserved bit is set, we need inject the #PF with PFEC.RSVD=1,
but shadow_notrap_nonpresent_pte injects #PF with PFEC.RSVD=0 only
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
This allows Linux to mask cpuid bits if, for example, nx is enabled on only
some cpus.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Instead of querying cpuid directly, use the Linux accessors (boot_cpu_has,
etc.). This allows the things like the clearcpuid kernel command line to
work (when it's fixed wrt scattered cpuid bits).
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If apf is generated in L2 guest and is completed in L1 guest, it will
prefault this apf in L1 guest's mmu context.
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
If CR0.PG is changed, the page fault cann't be avoid when the prefault address
is accessed later
And it also fix a bug: it can retry a page enabled #PF in page disabled context
if mmu is shadow page
This idear is from Gleb Natapov
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
IA64 support forces us to abstract the allocation of the kvm structure.
But instead of mixing this up with arch-specific initialization and
doing the same on destruction, split both steps. This allows to move
generic destruction calls into generic code.
It also fixes error clean-up on failures of kvm_create_vm for IA64.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Changed makefile to use the ccflags-y option instead of EXTRA_CFLAGS.
Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
Remove the declaration of kvm_mmu_set_base_ptes()
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
While not mandated by the spec, Linux relies on NMI being blocked by an
IF-enabling STI. VMX also refuses to enter a guest in this state, at
least on some implementations.
Disallow NMI while blocked by STI by checking for the condition, and
requesting an interrupt window exit if it occurs.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
In current code, it checks async pf completion out of the wait context,
like this:
if (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
r = vcpu_enter_guest(vcpu);
else {
......
kvm_vcpu_block(vcpu)
^- waiting until 'async_pf.done' is not empty
}
kvm_check_async_pf_completion(vcpu)
^- delete list from async_pf.done
So, if we check aysnc pf completion first, it can be blocked at
kvm_vcpu_block
Fixed by mark the vcpu is unhalted in kvm_check_async_pf_completion()
path
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tracing 'async' and *pfn is useless, since 'async' is always true,
and '*pfn' is always "fault_pfn'
We can trace 'gva' and 'gfn' instead, it can help us to see the
life-cycle of an async_pf
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently the exit is unhandled, so guest halts with error if it tries
to execute INVD instruction. Call into emulator when INVD instruction
is executed by a guest instead. This instruction is not needed by ordinary
guests, but firmware (like OpenBIOS) use it and fail.
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Micro optimization to avoid calling wbinvd twice on the CPU that has to
emulate it. As we might be preempted between smp_call_function_many and
the local wbinvd, the cache might be filled again so that real work
could be done uselessly.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Currently x86's kvm_vm_ioctl_get_dirty_log() needs to allocate a bitmap by
vmalloc() which will be used in the next logging and this has been causing
bad effect to VGA and live-migration: vmalloc() consumes extra systime,
triggers tlb flush, etc.
This patch resolves this issue by pre-allocating one more bitmap and switching
between two bitmaps during dirty logging.
Performance improvement:
I measured performance for the case of VGA update by trace-cmd.
The result was 1.5 times faster than the original one.
In the case of live migration, the improvement ratio depends on the workload
and the guest memory size. In general, the larger the memory size is the more
benefits we get.
Note:
This does not change other architectures's logic but the allocation size
becomes twice. This will increase the actual memory consumption only when
the new size changes the number of pages allocated by vmalloc().
Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
As suggested by Andrea, pass r/w error code to gup(), upgrading read fault
to writable if host pte allows it.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
This can happen in the following scenario:
vcpu0 vcpu1
read fault
gup(.write=0)
gup(.write=1)
reuse swap cache, no COW
set writable spte
use writable spte
set read-only spte
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
The EPT present/writable bits use the same position as normal
pagetable bits.
Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting
the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte.
Also pass present/writable error code information from EPT violation
to generic pagefault handler.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
After an interrupt injection, the PPR changes, and we have to reflect that
into the vapic. This causes a KVM_REQ_EVENT to be set, which causes the
whole interrupt injection routine to be run again (harmlessly).
Optimize by only setting KVM_REQ_EVENT if the ppr was lowered; otherwise
there is no chance that a new injection is needed.
Signed-off-by: Avi Kivity <avi@redhat.com>
ldt is never used in the kernel context; same goes for fs (x86_64) and gs
(i386). So save/restore them in the heavyweight exit path instead
of the lightweight path.
By itself, this doesn't buy us much, but it paves the way for moving vmload
and vmsave to the heavyweight exit path, since they modify the same registers.
[jan: fix copy/pase mistake on i386]
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Saving guest registers is just a memory copy, and does not need to be in the
critical section. Move outside the critical section to improve latency a
bit.
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
May otherwise generates build warnings about unused
kvm_read_and_reset_pf_reason if included without CONFIG_KVM_GUEST
enabled.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
gcc 4.5 with some special options is able to duplicate the VMX
context switch asm in vmx_vcpu_run(). This results in a compile error
because the inline asm sequence uses an on local label. The non local
label is needed because other code wants to set up the return address.
This patch moves the asm code into an own function and marks
that explicitely noinline to avoid this problem.
Better would be probably to just move it into an .S file.
The diff looks worse than the change really is, it's all just
code movement and no logic change.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
It has no user outside mmu.c and also no prototype.
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If guest indicates that it can handle async pf in kernel mode too send
it, but only if interrupts are enabled.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If guest can detect that it runs in non-preemptable context it can
handle async PFs at any time, so let host know that it can send async
PF even if guest cpu is not in userspace.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
If async page fault is received by idle task or when preemp_count is
not zero guest cannot reschedule, so do sti; hlt and wait for page to be
ready. vcpu can still process interrupts while it waits for the page to
be ready.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Send async page fault to a PV guest if it accesses swapped out memory.
Guest will choose another task to run upon receiving the fault.
Allow async page fault injection only when guest is in user mode since
otherwise guest may be in non-sleepable context and will not be able
to reschedule.
Vcpu will be halted if guest will fault on the same page again or if
vcpu executes kernel code.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
When async PF capability is detected hook up special page fault handler
that will handle async page fault events and bypass other page faults to
regular page fault handler. Also add async PF handling to nested SVM
emulation. Async PF always generates exit to L1 where vcpu thread will
be scheduled out until page is available.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Enable async PF in a guest if async PF capability is discovered.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Async PF also needs to hook into smp_prepare_boot_cpu so move the hook
into generic code.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Keep track of memslots changes by keeping generation number in memslots
structure. Provide kvm_write_guest_cached() function that skips
gfn_to_hva() translation if memslots was not changed since previous
invocation.
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>