include/linux/kvm.h defines struct kvm_dirty_log to
[...]
union {
void __user *dirty_bitmap; /* one bit per page */
__u64 padding;
};
__user requires compiler.h to compile. Currently, this works on x86
only coincidentally due to other include files. This patch makes
kvm.h compile in all cases.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
it will allow external users to call it. It is mainly
useful for routines that will override its machine_ops
field for its own special purposes, but want to call the
normal shutdown routine after they're done
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch a llows machine_crash_shutdown to
be replaced, just like any of the other functions
in machine_ops
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Hypercall based pte updates are faster than faults, and also allow use
of the lazy MMU mode to batch operations.
Don't report the feature if two dimensional paging is enabled.
[avi:
- one mmu_op hypercall instead of one per op
- allow 64-bit gpa on hypercall
- don't pass host errors (-ENOMEM) to guest]
[akpm: warning fix on i386]
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The patch moves the PIT model from userspace to kernel, and increases
the timer accuracy greatly.
[marcelo: make last_injected_time per-guest]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-and-Acked-by: Alex Davis <alex14641@yahoo.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
In the current inject_page_fault path KVM only checks if there is another PF
pending and injects a DF then. But it has to check for a pending DF too to
detect a shutdown condition in the VCPU. If this is not detected the VCPU goes
to a PF -> DF -> PF loop when it should triple fault. This patch detects this
condition and handles it with an KVM_SHUTDOWN exit to userspace.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Create large pages mappings if the guest PTE's are marked as such and
the underlying memory is hugetlbfs backed. If the largepage contains
write-protected pages, a large pte is not used.
Gives a consistent 2% improvement for data copies on ram mounted
filesystem, without NPT/EPT.
Anthony measures a 4% improvement on 4-way kernbench, with NPT.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Mark zapped root pagetables as invalid and ignore such pages during lookup.
This is a problem with the cr3-target feature, where a zapped root table fools
the faulting code into creating a read-only mapping. The result is a lockup
if the instruction can't be emulated.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writing to it periodically.
The area is binary compatible with xen, as we use the same shadow_info
structure.
[marcelo: fix bad_page on MSR_KVM_SYSTEM_TIME]
[avi: save full value of the msr, even if enable bit is clear]
[avi: clear previous value of time_page]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The load_pdptrs() function is required in the SVM module for NPT support.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
The generic x86 code has to know if the specific implementation uses Nested
Paging. In the generic code Nested Paging is called Two Dimensional Paging
(TDP) to avoid confusion with (future) TDP implementations of other vendors.
This patch exports the availability of TDP to the generic x86 code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch give the SVM and VMX implementations the ability to add some bits
the guest can set in its EFER register.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This allows kvm_host.h to be #included even when struct preempt_notifier is
undefined. This is needed to build ppc asm-offsets.h.
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
To allow TLB entries to be retained across VM entry and VM exit, the VMM
can now identify distinct address spaces through a new virtual-processor ID
(VPID) field of the VMCS.
[avi: drop vpid_sync_all()]
[avi: add "cc" to asm constraints]
Signed-off-by: Sheng Yang <sheng.yang@intel.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86-bigbox-bootmem-v3:
x86_64/mm: check and print vmemmap allocation continuous
x86_64: fix setup_node_bootmem to support big mem excluding with memmap
x86_64: make reserve_bootmem_generic() use new reserve_bootmem()
mm: allow reserve_bootmem() cross nodes
mm: offset align in alloc_bootmem()
mm: fix alloc_bootmem_core to use fast searching for all nodes
mm: make mem_map allocation continuous
On big systems with lots of memory, don't print out too much during
bootup, and make it easy to find if it is continuous.
on 256G 8 sockets system will get
[ffffe20000000000-ffffe20002bfffff] PMD -> [ffff810001400000-ffff810003ffffff] on node 0
[ffffe2001c700000-ffffe2001c7fffff] potential offnode page_structs
[ffffe20002c00000-ffffe2001c7fffff] PMD -> [ffff81000c000000-ffff8100255fffff] on node 0
[ffffe20038700000-ffffe200387fffff] potential offnode page_structs
[ffffe2001c800000-ffffe200387fffff] PMD -> [ffff810820200000-ffff81083c1fffff] on node 1
[ffffe20040000000-ffffe2007fffffff] PUD ->ffff811027a00000 on node 2
[ffffe20038800000-ffffe2003fffffff] PMD -> [ffff811020200000-ffff8110279fffff] on node 2
[ffffe20054700000-ffffe200547fffff] potential offnode page_structs
[ffffe20040000000-ffffe200547fffff] PMD -> [ffff811027c00000-ffff81103c3fffff] on node 2
[ffffe20070700000-ffffe200707fffff] potential offnode page_structs
[ffffe20054800000-ffffe200707fffff] PMD -> [ffff811820200000-ffff81183c1fffff] on node 3
[ffffe20080000000-ffffe200bfffffff] PUD ->ffff81202fa00000 on node 4
[ffffe20070800000-ffffe2007fffffff] PMD -> [ffff812020200000-ffff81202f9fffff] on node 4
[ffffe2008c700000-ffffe2008c7fffff] potential offnode page_structs
[ffffe20080000000-ffffe2008c7fffff] PMD -> [ffff81202fc00000-ffff81203c3fffff] on node 4
[ffffe200a8700000-ffffe200a87fffff] potential offnode page_structs
[ffffe2008c800000-ffffe200a87fffff] PMD -> [ffff812820200000-ffff81283c1fffff] on node 5
[ffffe200c0000000-ffffe200ffffffff] PUD ->ffff813037a00000 on node 6
[ffffe200a8800000-ffffe200bfffffff] PMD -> [ffff813020200000-ffff8130379fffff] on node 6
[ffffe200c4700000-ffffe200c47fffff] potential offnode page_structs
[ffffe200c0000000-ffffe200c47fffff] PMD -> [ffff813037c00000-ffff81303c3fffff] on node 6
[ffffe200c4800000-ffffe200e07fffff] PMD -> [ffff813820200000-ffff81383c1fffff] on node 7
instead of a very long print out...
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
typical case: four sockets system, every node has 4g ram, and we are using:
memmap=10g$4g
to mask out memory on node1 and node2
when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
if it can not get memory from the same node with find_e820_area(), it will
use alloc_bootmem to get buff from previous nodes.
so check it and print out some info about it.
need to move early_res_to_bootmem into every setup_node_bootmem.
and it takes range that node has. otherwise alloc_bootmem could return addr
that reserved early.
depends on "mm: make reserve_bootmem can crossed the nodes".
Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-generic-bitops-v3:
x86, bitops: select the generic bitmap search functions
x86: include/asm-x86/pgalloc.h/bitops.h: checkpatch cleanups - formatting only
x86: finalize bitops unification
x86, UML: remove x86-specific implementations of find_first_bit
x86: optimize find_first_bit for small bitmaps
x86: switch 64-bit to generic find_first_bit
x86: generic versions of find_first_(zero_)bit, convert i386
bitops: use __fls for fls64 on 64-bit archs
generic: implement __fls on all 64-bit archs
generic: introduce a generic __fls implementation
x86: merge the simple bitops and move them to bitops.h
x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps
x86, uml: fix uml with generic find_next_bit for x86
x86: change x86 to use generic find_next_bit
uml: Kconfig cleanup
uml: fix build error
* Export ide_dma_exec_cmd() and __ide_dma_test_irq().
* Constify struct ide_dma_ops.
* Always set hwif->dma_ops to &sff_dma_ops in ide_setup_dma()
(it is later overriden by ide_init_port() if needed) and drop
'const struct ide_port_info *d' argument.
While at it:
* Rename __ide_dma_test_irq() to ide_dma_test_irq().
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Add struct ide_dma_ops and convert core code + drivers to use it.
While at it:
* Drop "ide_" prefix from ->ide_dma_end and ->ide_dma_test_irq methods.
* Drop "ide_" "infixes" from DMA methods.
* au1xxx-ide.c:
- use auide_dma_{test_irq,end}() directly in auide_dma_timeout()
* pdc202xx_old.c:
- drop "old_" "infixes" from DMA methods
* siimage.c:
- add siimage_dma_test_irq() helper
- print SATA warning in siimage_init_one()
* Remove no longer needed ->init_hwif implementations.
v2:
* Changes based on review from Sergei:
- s/siimage_ide_dma_test_irq/siimage_dma_test_irq/
- s/drive->hwif/hwif/ in idefloppy_pc_intr().
- fix patch description w.r.t. au1xxx-ide changes
- fix au1xxx-ide build
- fix naming for cmd64*_dma_ops
- drop "ide_" and "old_" infixes
- s/hpt3xxx_dma_ops/hpt37x_dma_ops/
- s/hpt370x_dma_ops/hpt370_dma_ops/
- use correct DMA ops for HPT302/N, HPT371/N and HPT374
- s/it821x_smart_dma_ops/it821x_pass_through_dma_ops/
v3:
* Two bugs slipped in v2 (noticed by Sergei):
- use correct DMA ops for HPT374 (for real this time)
- handle HPT370/HPT370A properly
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add IDE_HFLAG_SERIALIZE_DMA host flag to serialize ports
if DMA is available and handle it in ide_init_port().
* Convert sl82c105 host driver to use this new flag.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Make ide_hwif_setup_dma() return an error value.
* Pass 'const struct ide_port_info *d' instead of 'unsigned long dmabase'
to ->init_dma method and make it return an error value.
* Rename ide_get_or_set_dma_base() to ide_pci_dma_base(),
change ordering of its arguments and then export it.
* Export ide_pci_set_master().
* Do complete DMA setup inside ->init_dma method and update ->init_dma
users accordingly.
* Sanitize code for DMA setup in ide_init_port().
v2:
* Fix for CONFIG_BLK_DEV_IDEDMA_PCI=n configs
(from Jiri Slaby <jirislaby@gmail.com>):
Fix following compiler warning by returning EINVAL:
In file included from ANYTHING-INCLUDING-IDE.H:45:
include/linux/ide.h: In function ‘ide_hwif_setup_dma’:
include/linux/ide.h:1022: warning: no return statement in function returning non-void
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Always use "fast" MWDMA support and remove dma_{black,white}_list
(they were based on completely bogus ->ide_dma_check implementation
which didn't set neither the host controller timings nor the device
for the desired transfer mode).
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Export ide_allocate_dma_engine() and use it in trm290 host driver.
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Use hwif->name instead of cds->name in ide_allocate_dma_engine().
* Use pci_name(dev) instead of cds->name in init_dma_pdc202xx().
* Remove no longer needed ->cds field from ide_hwif_t.
v2:
* scc_pata.c also needs to be updated now (noticed by Stephen Rothwell).
There should be no functional changes caused by this patch.
Cc: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>
Cc: Akira Iguchi <akira2.iguchi@toshiba.co.jp>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Always setup hwif->extra_base in ide_iomio_dma() and remove
no longer needed ->extra field from struct ide_port_info.
There should be no functional changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Reserve PCI BARs 0-3 (0-1 for single port controllers) in
ide_pci_enable() and remove ide_hwif_request_regions() call
from ide_device_add_all() (also cleanup resource management
in scc_pata host driver).
* Fix handling of PCI BAR 4 in ide_pci_enable(), then cleanup
ide_iomio_dma() (+ init_hwif_trm290() in trm290 host driver)
and remove ide_release[_iomio]_dma().
v2:
* Fixup trm290 host driver.
v3:
* Because of scc_pata host driver changes we need to call
pci_request_selected_regions() also in setup_mmio_scc().
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
All host drivers using ide_unregister()/module_exit() have been fixed
to manage resources themselves so this function can be removed now.
There should be no functional changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Add 'unsigned long config' argument to ide_legacy_device_add()
for setting hwif->config_data.
* Use ide_find_port_slot() instead of ide_find_port() in
ide_legacy_device_add().
* Handle IDE_HFLAG_QD_2ND_PORT and IDE_HFLAG_SINGLE host flags in
ide_legacy_device_add().
* Convert qd65xx host driver to use ide_legacy_device_add().
v2:
* Update ali14xx, dtc2278, ht6560b and umc8672 host drivers.
There should be no functional changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Add ide_legacy_device_add() helper for use by legacy VLB host drivers
(+ convert them to use it).
There should be no functional changes caused by this patch.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Update IDE PMAC host driver to use drive->noprobe instead of hwif->noprobe
and remove hwif->noprobe completely (it is always set to zero now).
There should be no functional changes caused by this patch.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* Move hooks for port/host specific methods from ide_hwif_t to
'struct ide_port_ops'.
* Add 'const struct ide_port_ops *port_ops' to 'struct ide_port_info'
and ide_hwif_t.
* Update host drivers and core code accordingly.
While at it:
* Rename ata66_*() cable detect functions to *_cable_detect() to match
the standard naming. (Suggested by Sergei Shtylyov)
v2:
* Fix build for bast-ide. (Noticed by Andrew Morton)
Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This patch adds a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add free_early to early reservation mechanism - this way early bootup
failure paths can stop wasting memory.
Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
include/asm-x86/bitops_32.h and include/asm-x86/bitops_64.h are now
almost identical. The 64-bit version sets ARCH_HAS_FAST_MULTIPLIER
and has an extra inline function set_bit_string. The define currently
has no influence on the generated code, but it can be argued that
setting it on i386 is the right thing to do anyhow. The addition
of the extra inline function on i386 does not hurt either.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>