linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-16 00:52:01 +00:00

Author	SHA1	Message	Date
Randy Dunlap	be52178b9f	[NET] skbuff: fix kernel-doc Fix skbuff.h kernel-doc: linux-2.6.21-git4//include/linux/skbuff.h:316): No description found for parameter 'transport_header' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:16:20 -07:00
David Howells	ef4533f8af	[AFS]: Make the match_() functions take const options. Make the match_() functions take a const pointer to the options table and make strings pointers in the options table const too. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:10:39 -07:00
Eric Dumazet	709525fad8	[IPV6]: Get rid of __HAVE_ARCH_ADDR_SET. __HAVE_ARCH_ADDR_SET seems unused these days, just get rid of it. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-05-03 03:08:43 -07:00
Avi Kivity	2ff81f70b5	KVM: Remove unused 'instruction_length' As we no longer emulate in userspace, this is meaningless. We don't compute it on SVM anyway. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:32 +03:00
Avi Kivity	02c8320972	KVM: Don't require explicit indication of completion of mmio or pio It is illegal not to return from a pio or mmio request without completing it, as mmio or pio is an atomic operation. Therefore, we can simplify the userspace interface by avoiding the completion indication. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:32 +03:00
Avi Kivity	b8836737d9	KVM: Add fpu get/set operations These are really helpful when migrating an floating point app to another machine. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:28 +03:00
Avi Kivity	e8207547d2	KVM: Add physical memory aliasing feature With this, we can specify that accesses to one physical memory range will be remapped to another. This is useful for the vga window at 0xa0000 which is used as a movable window into the (much larger) framebuffer. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:28 +03:00
Avi Kivity	039576c03c	KVM: Avoid guest virtual addresses in string pio userspace interface The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:25 +03:00
Avi Kivity	07c45a366d	KVM: Allow kernel to select size of mmap() buffer This allows us to store offsets in the kernel/user kvm_run area, and be sure that userspace has them mapped. As offsets can be outside the kvm_run struct, userspace has no way of knowing how much to mmap. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	1961d276c8	KVM: Add guest mode signal mask Allow a special signal mask to be used while executing in guest mode. This allows signals to be used to interrupt a vcpu without requiring signal delivery to a userspace handler, which is quite expensive. Userspace still receives -EINTR and can get the signal via sigwait(). Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	1b19f3e61d	KVM: Add a special exit reason when exiting due to an interrupt This is redundant, as we also return -EINTR from the ioctl, but it allows us to examine the exit_reason field on resume without seeing old data. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	8eb7d334bd	KVM: Fold kvm_run::exit_type into kvm_run::exit_reason Currently, userspace is told about the nature of the last exit from the guest using two fields, exit_type and exit_reason, where exit_type has just two enumerations (and no need for more). So fold exit_type into exit_reason, reducing the complexity of determining what really happened. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	b4e63f560b	KVM: Allow userspace to process hypercalls which have no kernel handler This is useful for paravirtualized graphics devices, for example. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	5d308f4550	KVM: Add method to check for backwards-compatible API extensions Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:24 +03:00
Avi Kivity	739872c56f	KVM: Renumber ioctls The recent changes have left the ioctl numbers in complete disarray. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	2a4dac3952	KVM: Remove minor wart from KVM_CREATE_VCPU ioctl That ioctl does not transfer any data, so it should be an _IO rather than an _IOW. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	106b552b43	KVM: Remove the 'emulated' field from the userspace interface We no longer emulate single instructions in userspace. Instead, we service mmio or pio requests. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	06465c5a3a	KVM: Handle cpuid in the kernel instead of punting to userspace KVM used to handle cpuid by letting userspace decide what values to return to the guest. We now handle cpuid completely in the kernel. We still let userspace decide which values the guest will see by having userspace set up the value table beforehand (this is necessary to allow management software to set the cpu features to the least common denominator, so that live migration can work). The motivation for the change is that kvm kernel code can be impacted by cpuid features, for example the x86 emulator. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	46fc147788	KVM: Do not communicate to userspace through cpu registers during PIO Currently when passing the a PIO emulation request to userspace, we rely on userspace updating %rax (on 'in' instructions) and %rsi/%rdi/%rcx (on string instructions). This (a) requires two extra ioctls for getting and setting the registers and (b) is unfriendly to non-x86 archs, when they get kvm ports. So fix by doing the register fixups in the kernel and passing to userspace only an abstract description of the PIO to be done. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	9a2bb7f486	KVM: Use a shared page for kernel/user communication when runing a vcpu Instead of passing a 'struct kvm_run' back and forth between the kernel and userspace, allocate a page and allow the user to mmap() it. This reduces needless copying and makes the interface expandable by providing lots of free space. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	ff42697436	KVM: Export <linux/kvm.h> This allows users to actually build prgrams that use kvm without the entire source tree. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:22 +03:00
Avi Kivity	bbe4432e66	KVM: Use own minor number Use the minor number (232) allocated to kvm by lanana. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:22 +03:00
Adrian Bunk	ecf36501bc	PCI: the overdue removal of pci_module_init() Unless we finally completely remove it, people will always add new users. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Adrian Bunk	5adc55da4a	PCI: remove the broken PCI_MULTITHREAD_PROBE option This patch removes the PCI_MULTITHREAD_PROBE option that had already been marked as broken. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Michael Ellerman	032de8e2fe	MSI: Give archs the option to free all MSI/Xs at once. This patch introduces an optional function, arch_teardown_msi_irqs(), which gives an arch the opportunity to do per-device teardown for MSI/X. If that's not required, the default version simply calls arch_teardown_msi_irq() for each msi irq required. arch_teardown_msi_irqs() is simply passed a pdev, attached to the pdev is a list of msi_descs, it is up to the arch to free the irq associated with each of these as appropriate. For archs that _don't_ implement arch_teardown_msi_irqs(), all msi_descs with irq == 0 are considered unallocated, and the arch teardown routine is not called on them. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Michael Ellerman	9c8313343c	MSI: Give archs the option to allocate all MSI/Xs at once. This patch introduces an optional function, arch_setup_msi_irqs(), (note the plural) which gives an arch the opportunity to do per-device setup for MSI/X and then allocate all the requested MSI/Xs at once. If that's not required by the arch, the default version simply calls arch_setup_msi_irq() for each MSI irq required. arch_setup_msi_irqs() is passed a pdev, attached to the pdev is a list of msi_descs with irq == 0, it is up to the arch to connect these up to an irq (via set_irq_msi()) or return an error. For convenience the number of vectors and the type are passed also. All msi_descs with irq != 0 are considered allocated, and the arch teardown routine will be called on them when necessary. The existing semantics of pci_enable_msix() are that if the requested number of irqs can not be allocated, the maximum number that _could_ be allocated is returned. To support that, we define that in case of an error from arch_setup_msi_irqs(), the number of msi_descs with irq != 0 are considered allocated, and are counted toward the "max that could be allocated". Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:38 -07:00
Michael Ellerman	314e77b3ee	MSI: Remove dev->first_msi_irq Now that we keep a list of msi descriptors, we don't need first_msi_irq in the pci dev. If we somehow have zero MSIs configured list_entry() will give us weird oopes or nice memory corruption bugs. So be paranoid. Add BUG_ONs and also a check in pci_msi_check_device() to make sure nvec > 0. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Michael Ellerman	4aa9bc955d	MSI: Use a list instead of the custom link structure The msi descriptors are linked together with what looks a lot like a linked list, but isn't a struct list_head list. Make it one. The only complication is that previously we walked a list of irqs, and got the descriptor for each with get_irq_msi(). Now we have a list of descriptors and need to get the irq out of it, so it needs to be in the actual struct msi_desc. We use 0 to indicate no irq is setup. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Michael Ellerman	65891215e6	PCI: Create alloc_pci_dev(), the one true way to create a struct pci_dev There are currently several places in the kernel where we kmalloc() a struct pci_dev and start initialising it. It'd be preferable to have an allocator so we can ensure the pci_dev is correctly initialised in one place. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Michael Ellerman	c9953a73e9	MSI: Add an arch_msi_check_device() Add an arch_check_device(), which gives archs a chance to check the input to pci_enable_msi/x. The arch might be interested in the value of nvec so pass it in. Propagate the error value returned from the arch routine out to the caller. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:37 -07:00
Sergei Shtylyov	0da0ead901	PCI: define pci_request/release_regions() for CONFIG_PCI=n Balance declarations of pci_request_regions() and pci_release_regions() with empty inline definitions for the CONFIG_PCI=n case -- otherwise my patch to drivers/net/3c59x.c in the -mm tree doesn't compile. :-) Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:35 -07:00
Jean Delvare	6473d160b4	PCI: Cleanup the includes of <linux/pci.h> I noticed that many source files include <linux/pci.h> while they do not appear to need it. Here is an attempt to clean it all up. In order to find all possibly affected files, I searched for all files including <linux/pci.h> but without any other occurence of "pci" or "PCI". I removed the include statement from all of these, then I compiled an allmodconfig kernel on both i386 and x86_64 and fixed the false positives manually. My tests covered 66% of the affected files, so there could be false positives remaining. Untested files are: arch/alpha/kernel/err_common.c arch/alpha/kernel/err_ev6.c arch/alpha/kernel/err_ev7.c arch/ia64/sn/kernel/huberror.c arch/ia64/sn/kernel/xpnet.c arch/m68knommu/kernel/dma.c arch/mips/lib/iomap.c arch/powerpc/platforms/pseries/ras.c arch/ppc/8260_io/enet.c arch/ppc/8260_io/fcc_enet.c arch/ppc/8xx_io/enet.c arch/ppc/syslib/ppc4xx_sgdma.c arch/sh64/mach-cayman/iomap.c arch/xtensa/kernel/xtensa_ksyms.c arch/xtensa/platform-iss/setup.c drivers/i2c/busses/i2c-at91.c drivers/i2c/busses/i2c-mpc.c drivers/media/video/saa711x.c drivers/misc/hdpuftrs/hdpu_cpustate.c drivers/misc/hdpuftrs/hdpu_nexus.c drivers/net/au1000_eth.c drivers/net/fec_8xx/fec_main.c drivers/net/fec_8xx/fec_mii.c drivers/net/fs_enet/fs_enet-main.c drivers/net/fs_enet/mac-fcc.c drivers/net/fs_enet/mac-fec.c drivers/net/fs_enet/mac-scc.c drivers/net/fs_enet/mii-bitbang.c drivers/net/fs_enet/mii-fec.c drivers/net/ibm_emac/ibm_emac_core.c drivers/net/lasi_82596.c drivers/parisc/hppb.c drivers/sbus/sbus.c drivers/video/g364fb.c drivers/video/platinumfb.c drivers/video/stifb.c drivers/video/valkyriefb.c include/asm-arm/arch-ixp4xx/dma.h sound/oss/au1550_ac97.c I would welcome test reports for these files. I am fine with removing the untested files from the patch if the general opinion is that these changes aren't safe. The tested part would still be nice to have. Note that this patch depends on another header fixup patch I submitted to LKML yesterday: [PATCH] scatterlist.h needs types.h http://lkml.org/lkml/2007/3/01/141 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:35 -07:00
Jean Delvare	a9dfd281a7	PCI: scatterlist.h needs types.h Most architectures' scatterlist.h use the type dma_addr_t, but omit to include <asm/types.h> which defines it. This could lead to build failures, so let's add the missing includes. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:34 -07:00
Brian King	f7bdd12d23	pci: New PCI-E reset API Adds a new API which can be used to issue various types of PCI-E reset, including PCI-E warm reset and PCI-E hot reset. This is needed for an ipr PCI-E adapter which does not properly implement BIST. Running BIST on this adapter results in PCI-E errors. The only reliable reset mechanism that exists on this hardware is PCI Fundamental reset (warm reset). Since driving this type of reset is architecture unique, this provides the necessary hooks for architectures to add this support. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:34 -07:00
Greg Kroah-Hartman	823bccfc40	remove "struct subsystem" as it is no longer needed We need to work on cleaning up the relationship between kobjects, ksets and ktypes. The removal of 'struct subsystem' is the first step of this, especially as it is not really needed at all. Thanks to Kay for fixing the bugs in this patch. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 18:57:59 -07:00
Sam Ravnborg	dc24f0e708	kbuild: remove dependency on input.h from file2alias Almost all definitions used by file2alias was already present in mod_devicetable.h. Added the last definition and killed the input.h usage. The errornous include was pointed out by: Jan Engelhardt <jengelh@linux01.gwdg.de> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Deepak Saxena <dsaxena@plexity.net>	2007-05-02 20:58:08 +02:00
Jan Kiszka	c41bf8fa5e	[PATCH] i386: avoid redundant preempt_disable in __unlazy_fpu There are two callers of __unlazy_fpu, unlazy_fpu and __switch_to, and none of them appear to require additional preempt_disable/enable here. Let's open-code save_init_fpu in __unlazy_fpu to save a few ops. Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Jan Kiszka	02b64dab56	[PATCH] i386: white space fixes in i387.h Signed-off-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	c5bcb5635a	[PATCH] x86: Use RDTSCP for synchronous get_cycles if possible RDTSCP is already synchronous and doesn't need an explicit CPUID. This is a little faster and more importantly avoids VMEXITs on Hypervisors. Original patch from Joerg Roedel, but reworked by AK Also includes miscompilation fix by Eric Biederman Cc: "Joerg Roedel" <joerg.roedel@amd.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:21 +02:00
Andi Kleen	9bccb23dc5	[PATCH] i386: Add X86_FEATURE_RDTSCP Following x86-64 Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	3aefbe0746	[PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 Syncs up with x86-64. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	e859dc553c	[PATCH] i386: Implement alternative_io for i386 Ported from x86-64. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	3671df8572	[PATCH] i386: Evaluate constant cpu features at runtime Redefine cpu_has() to evaluate cpu features already checked in early boot at compile time. This way the compiler might eliminate some dead code. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	c7f81c9453	[PATCH] i386: Verify important CPUID bits in real mode Check some CPUID bits that are needed for compiler generated early in boot. When the system is still in real mode before changing the VESA BIOS mode it is possible to still display an visible error message on the screen. Similar to x86-64. Includes cleanups from Eric Biederman Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	05cb007dac	[PATCH] x86-64: Use the 32bit wd_ops for 64bit too. This mainly removes a lot of code, replacing it with calls into the new 32bit perfctr-watchdog.c Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Andi Kleen	09198e6850	[PATCH] i386: Clean up NMI watchdog code - Introduce a wd_ops structure - Convert the various nmi watchdogs over to it - This allows to split the perfctr reservation from the watchdog setup cleanly. - Do perfctr reservation globally as it should have always been - Remove dead code referenced only by unused EXPORT_SYMBOLs Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:20 +02:00
Zachary Amsden	9e5e3162b2	[PATCH] i386: pte simplify ops Add comment and condense code to make use of native_local_ptep_get_and_clear function. Also, it turns out the 2-level and 3-level paging definitions were identical, so move the common definition into pgtable.h Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:19 +02:00
Zachary Amsden	142dd97591	[PATCH] i386: pte xchg optimization In situations where page table updates need only be made locally, and there is no cross-processor A/D bit races involved, we need not use the heavyweight xchg instruction to atomically fetch and clear page table entries. Instead, we can just read and clear them directly. This introduces a neat optimization for non-SMP kernels; drop the atomic xchg operations from page table updates. Thanks to Michel Lespinasse for noting this potential optimization. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:19 +02:00
Zachary Amsden	c2c1accd4b	[PATCH] i386: pte clear optimization When exiting from an address space, no special hypervisor notification of page table updates needs to occur; direct page table hypervisors, such as Xen, switch to another address space first (init_mm) and unprotects the page tables to avoid the cost of trapping to the hypervisor for each pte_clear. Shadow mode hypervisors, such as VMI and lhype don't need to do the extra work of calling through paravirt-ops, and can just directly clear the page table entries without notifiying the hypervisor, since all the page tables are about to be freed. So introduce native_pte_clear functions which bypass any paravirt-ops notification. This results in a significant performance win for VMI and removes some indirect calls from zap_pte_range. Note the 3-level paging already had a native_pte_clear function, thus demanding argument conformance and extra args for the 2-level definition. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:19 +02:00
Andi Kleen	57a4f91ae5	[PATCH] x86-64: Auto compute __NR_syscall_max at compile time No need to maintain it anymore Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:18 +02:00
Fernando Luis [ ISO-8859-1 charset ] V�zquezCao	70ae77f497	[PATCH] x86-64: Use safe_apic_wait_icr_idle in __send_IPI_dest_field - x86_64 Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is NMI_VECTOR to avoid potential hangups in the event of crash when kdump tries to stop the other CPUs. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:18 +02:00
Fernando Luis [ ISO-8859-1 charset ] V�zquezCao	9062d888aa	[PATCH] x86-64: __send_IPI_dest_field - x86_64 Implement __send_IPI_dest_field which can be used to send IPIs when the "destination shorthand" field of the ICR is set to 00 (destination field). Use it whenever possible. Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:18 +02:00
Fernando Luis VazquezCao	8339e9fba3	[PATCH] x86-64: safe_apic_wait_icr_idle - x86_64 apic_wait_icr_idle looks like this: static __inline__ void apic_wait_icr_idle(void) { while (apic_read(APIC_ICR) & APIC_ICR_BUSY) cpu_relax(); } The busy loop in this function would not be problematic if the corresponding status bit in the ICR were always updated, but that does not seem to be the case under certain crash scenarios. Kdump uses an IPI to stop the other CPUs in the event of a crash, but when any of the other CPUs are locked-up inside the NMI handler the CPU that sends the IPI will end up looping forever in the ICR check, effectively hard-locking the whole system. Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3: "A local APIC unit indicates successful dispatch of an IPI by resetting the Delivery Status bit in the Interrupt Command Register (ICR). The operating system polls the delivery status bit after sending an INIT or STARTUP IPI until the command has been dispatched. A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions. If the IPI is not successfully dispatched, the operating system can abort the command. Alternatively, the operating system can retry the IPI by writing the lower 32-bit double word of the ICR. This “time-out” mechanism can be implemented through an external interrupt, if interrupts are enabled on the processor, or through execution of an instruction or time-stamp counter spin loop." Intel's documentation suggests the implementation of a time-out mechanism, which, by the way, is already being open-coded in some parts of the kernel that tinker with ICR. Create a apic_wait_icr_idle replacement that implements the time-out mechanism and that can be used to solve the aforementioned problem. AK: moved both functions out of line AK: Added improved loop from Keith Owens Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:17 +02:00
Fernando Luis VazquezCao	f2b218dd61	[PATCH] i386: safe_apic_wait_icr_idle - i386 apic_wait_icr_idle looks like this: static __inline__ void apic_wait_icr_idle(void) { while (apic_read(APIC_ICR) & APIC_ICR_BUSY) cpu_relax(); } The busy loop in this function would not be problematic if the corresponding status bit in the ICR were always updated, but that does not seem to be the case under certain crash scenarios. Kdump uses an IPI to stop the other CPUs in the event of a crash, but when any of the other CPUs are locked-up inside the NMI handler the CPU that sends the IPI will end up looping forever in the ICR check, effectively hard-locking the whole system. Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3: "A local APIC unit indicates successful dispatch of an IPI by resetting the Delivery Status bit in the Interrupt Command Register (ICR). The operating system polls the delivery status bit after sending an INIT or STARTUP IPI until the command has been dispatched. A period of 20 microseconds should be sufficient for IPI dispatch to complete under normal operating conditions. If the IPI is not successfully dispatched, the operating system can abort the command. Alternatively, the operating system can retry the IPI by writing the lower 32-bit double word of the ICR. This “time-out” mechanism can be implemented through an external interrupt, if interrupts are enabled on the processor, or through execution of an instruction or time-stamp counter spin loop." Intel's documentation suggests the implementation of a time-out mechanism, which, by the way, is already being open-coded in some parts of the kernel that tinker with ICR. Create a apic_wait_icr_idle replacement that implements the time-out mechanism and that can be used to solve the aforementioned problem. AK: moved both functions out of line AK: added improved loop from Keith Owens Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:17 +02:00
Bernhard Kaindl	de938c51d5	[PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync If our copy of the MTRRs of the BSP has RdMem or WrMem set, and we are running on an AMD64/K8 system, the boot CPU must have had MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would have copied these bits cleared), so we set them on this CPU as well. This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync across the CPUs of SMP systems in order to fullfill the duty of system software to "initialize and maintain MTRR consistency across all processors." as written in the AMD and Intel manuals. If an WRMSR instruction fails because MtrrFixDramModEn is not set, I expect that also the Intel-style MTRR bits are not updated. AK: minor cleanup, moved MSR defines around Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>	2007-05-02 19:27:17 +02:00
Bernhard Kaindl	2b1f6278d7	[PATCH] x86: Save the MTRRs of the BSP before booting an AP Applied fix by Andew Morton: http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'. AMD and Intel x86 CPU manuals state that it is the responsibility of system software to initialize and maintain MTRR consistency across all processors in Multi-Processing Environments. Quote from page 188 of the AMD64 System Programming manual (Volume 2): 7.6.5 MTRRs in Multi-Processing Environments "In multi-processing environments, the MTRRs located in all processors must characterize memory in the same way. Generally, this means that identical values are written to the MTRRs used by the processors." (short omission here) "Failure to do so may result in coherency violations or loss of atomicity. Processor implementations do not check the MTRR settings in other processors to ensure consistency. It is the responsibility of system software to initialize and maintain MTRR consistency across all processors." Current Linux MTRR code already implements the above in the case that the BIOS does not properly initialize MTRRs on the secondary processors, but the case where the fixed-range MTRRs of the boot processor are changed after Linux started to boot, before the initialsation of a secondary processor, is not handled yet. In this case, secondary processors are currently initialized by Linux with MTRRs which the boot processor had very early, when mtrr_bp_init() did run, but not with the MTRRs which the boot processor uses at the time when that secondary processors is actually booted, causing differing MTRR contents on the secondary processors. Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs of the boot processor when it transitions the system into ACPI mode. The SMI handler of the BIOS does this in SMM, entered while Linux ACPI code runs acpi_enable(). Other occasions where the SMI handler of the BIOS may change bits in the MTRRs could occur as well. To initialize newly booted secodary processors with the fixed-range MTRRs which the boot processor uses at that time, this patch saves the fixed-range MTRRs of the boot processor before new secondary processors are started. When the secondary processors run their Linux initialisation code, their fixed-range MTRRs will be updated with the saved fixed-range MTRRs. If CONFIG_MTRR is not set, we define mtrr_save_state as an empty statement because there is nothing to do. Possible TODOs: ) CPU-hotplugging outside of SMP suspend/resume is not yet tested with this patch. ) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up, then the calls to mtrr_save_state() could be replaced by calls to mtrr_save_fixed_ranges(NULL) and mtrr_save_state() would not be needed. That would need either verification of the CPU-hotplug code or at least a test on a >2 CPU machine. *) The MTRRs of other running processors are not yet checked at this time but it might be interesting to syncronize the MTTRs of all processors before booting. That would be an incremental patch, but of rather low priority since there is no machine known so far which would require this. AK: moved prototypes on x86-64 around to fix warnings Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>	2007-05-02 19:27:17 +02:00
Bernhard Kaindl	2b3b4835c9	[PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches. In this current implementation which is used in other patches, mtrr_save_fixed_ranges() accepts a dummy void pointer because in the current implementation of one of these patches, this function may be called from smp_call_function_single() which requires that this function takes a void pointer argument. This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges which is the element of the static struct which stores our current backup of the fixed-range MTRR values which all CPUs shall be using. Because mtrr_save_fixed_ranges calls get_fixed_ranges after kernel initialisation time, __init needs to be removed from the declaration of get_fixed_ranges(). If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges as an empty statement because there is nothing to do. AK: Moved prototypes for x86-64 around to fix warnings Signed-off-by: Bernhard Kaindl <bk@suse.de> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk>	2007-05-02 19:27:17 +02:00
Andi Kleen	856f44ff4a	[PATCH] x86-64: Move mtrr prototypes from proto.h to mtrr.h Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge	03df4f6ee9	[PATCH] i386: Clean up ELF note generation Three cleanups: 1: ELF notes are never mapped, so there's no need to have any access flags in their phdr. 2: When generating them from asm, tell the assembler to use a SHT_NOTE section type. There doesn't seem to be a way to do this from C. 3: Use ANSI rather than traditional cpp behaviour to stringify the macro argument. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Eric W. Biederman <ebiederm@xmission.com>	2007-05-02 19:27:17 +02:00
Jeremy Fitzhardinge	441d40dca0	[PATCH] x86: PARAVIRT: Jeremy Fitzhardinge <jeremy@goop.org> The other symbols used to delineate the alt-instructions sections have the form __foo/__foo_end. Rename parainstructions to match. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:16 +02:00
Zachary Amsden	e0bb864397	[PATCH] i386: Convert VMI timer to use clock events Convert VMI timer to use clock events, making it properly able to use the NO_HZ infrastructure. On UP systems, with no local APIC, we just continue to route these events through the PIT. On systems with a local APIC, or SMP, we provide a single source interrupt chip which creates the local timer IRQ. It actually gets delivered by the APIC hardware, but we don't want to use the same local APIC clocksource processing, so we create our own handler here. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de> CC: Dan Hecht <dhecht@vmware.com> CC: Ingo Molnar <mingo@elte.hu> CC: Thomas Gleixner <tglx@linutronix.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	57decbda6a	[PATCH] x86: update for i386 and x86-64 check_bugs Remove spurious comments, headers and keywords from x86-64 bugs.[ch]. Use identify_boot_cpu() AK: merged with other patch Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	c5413fbe89	[PATCH] i386: Fix UP gdt bugs Fixes two problems with the GDT when compiling for uniprocessor: - There's no percpu segment, so trying to load its selector into %fs fails. Use a null selector instead. - The real gdt needs to be loaded at some point. Do it in cpu_init(). Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	1956c73bb5	[PATCH] i386: Define per_cpu_offset Define per_cpu_offset in asm-i386/percpu.h when SMP defined, like asm-generic/percpu.h does for UP. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	978c038ec9	[PATCH] i386: cleanups to help using per-cpu variables from asm This patch does a few small cleanups: - use PER_CPU_NAME to generate the names of per-cpu variables - use lea to add the per_cpu offset in PER_CPU(), because it doesn't affect condition flags - add PER_CPU_VAR which allows direct access to pre-cpu variables with the %fs: prefix on SMP. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	7c3576d261	[PATCH] i386: Convert PDA into the percpu section Currently x86 (similar to x84-64) has a special per-cpu structure called "i386_pda" which can be easily and efficiently referenced via the %fs register. An ELF section is more flexible than a structure, allowing any piece of code to use this area. Indeed, such a section already exists: the per-cpu area. So this patch: (1) Removes the PDA and uses per-cpu variables for each current member. (2) Replaces the __KERNEL_PDA segment with __KERNEL_PERCPU. (3) Creates a per-cpu mirror of __per_cpu_offset called this_cpu_off, which can be used to calculate addresses for this CPU's variables. (4) Simplifies startup, because %fs doesn't need to be loaded with a special segment at early boot; it can be deferred until the first percpu area is allocated (or never for UP). The result is less code and one less x86-specific concept. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:16 +02:00
Jeremy Fitzhardinge	7a61d35d4b	[PATCH] i386: Page-align the GDT Xen wants a dedicated page for the GDT. I believe VMI likes it too. lguest, KVM and native don't care. Simple transformation to page-aligned "struct gdt_page". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Jeremy Fitzhardinge <jeremy@xensource.com>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	4cdd9c8931	[PATCH] i386: PARAVIRT: drop unused ptep_get_and_clear In shadow mode hypervisors, ptep_get_and_clear achieves the desired purpose of keeping the shadows in sync by issuing a native_get_and_clear, followed by a call to pte_update, which indicates the PTE has been modified. Direct mode hypervisors (Xen) have no need for this anyway, and will trap the update using writable pagetables. This means no hypervisor makes use of ptep_get_and_clear; there is no reason to have it in the paravirt-ops structure. Change confusing terminology about raw vs. native functions into consistent use of native_pte_xxx for operations which do not invoke paravirt-ops. Signed-off-by: Zachary Amsden <zach@vmware.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	1a45b7aaa5	[PATCH] i386: PARAVIRT: Clean up paravirt patchable wrappers Replace all the open-coded macros for generating calls with a pair of more general macros (__PVOP_CALL/VCALL), and redefine all the PVOP_V?CALL[0-4] in terms of them. [ Andrew, Andi: this should slot in immediately after "Document asm-i386/paravirt.h" (paravirt_ops-document-asm-i386-paravirth.patch) ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	4e0fa85602	[PATCH] i386: PARAVIRT: Use enums for paravirt lazy flush modi Remove #defines, add enum for PARAVIRT_LAZY_FLUSH. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	ce6234b529	[PATCH] i386: PARAVIRT: add kmap_atomic_pte for mapping highpte pages Xen and VMI both have special requirements when mapping a highmem pte page into the kernel address space. These can be dealt with by adding a new kmap_atomic_pte() function for mapping highptes, and hooking it into the paravirt_ops infrastructure. Xen specifically wants to map the pte page RO, so this patch exposes a helper function, kmap_atomic_prot, which maps the page with the specified page protections. This also adds a kmap_flush_unused() function to clear out the cached kmap mappings. Xen needs this to clear out any potential stray RW mappings of pages which will become part of a pagetable. [ Zach - vmi.c will need some attention after this patch. It wasn't immediately obvious to me what needs to be done. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	a27fe809b8	[PATCH] i386: PARAVIRT: revert map_pt_hook. Back out the map_pt_hook to clear the way for kmap_atomic_pte. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	d4c104771a	[PATCH] i386: PARAVIRT: add flush_tlb_others paravirt_op This patch adds a pv_op for flush_tlb_others. Linux running on native hardware uses cross-CPU IPIs to flush the TLB on any CPU which may have a particular mm's pagetable entries cached in its TLB. This is inefficient in a paravirtualized environment, since the hypervisor knows which real CPUs actually contain cached mappings, which may be a small subset of a guest's VCPUs. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:15 +02:00
Jeremy Fitzhardinge	63f70270cc	[PATCH] i386: PARAVIRT: add common patching machinery Implement the actual patching machinery. paravirt_patch_default() contains the logic to automatically patch a callsite based on a few simple rules: - if the paravirt_op function is paravirt_nop, then patch nops - if the paravirt_op function is a jmp target, then jmp to it - if the paravirt_op function is callable and doesn't clobber too much for the callsite, call it directly paravirt_patch_default is suitable as a default implementation of paravirt_ops.patch, will remove most of the expensive indirect calls in favour of either a direct call or a pile of nops. Backends may implement their own patcher, however. There are several helper functions to help with this: paravirt_patch_nop nop out a callsite paravirt_patch_ignore leave the callsite as-is paravirt_patch_call patch a call if the caller and callee have compatible clobbers paravirt_patch_jmp patch in a jmp paravirt_patch_insns patch some literal instructions over the callsite, if they fit This patch also implements more direct patches for the native case, so that when running on native hardware many common operations are implemented inline. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	294688c028	[PATCH] i386: PARAVIRT: Document asm-i386/paravirt.h Clean things up, and broadly document: - the paravirt_ops functions themselves - the patching mechanism Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	f8822f4201	[PATCH] i386: PARAVIRT: Consistently wrap paravirt ops callsites to make them patchable Wrap a set of interesting paravirt_ops calls in a wrapper which makes the callsites available for patching. Unfortunately this is pretty ugly because there's no way to get gcc to generate a function call, but also wrap just the callsite itself with the necessary labels. This patch supports functions with 0-4 arguments, and either void or returning a value. 64-bit arguments must be split into a pair of 32-bit arguments (lower word first). Small structures are returned in registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com> Cc: Anthony Liguori <anthony@codemonkey.ws>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	42c24fa22e	[PATCH] i386: PARAVIRT: Fix patch site clobbers to include return register Fix a few clobbers to include the return register. The clobbers set is the set of all registers modified (or may be modified) by the code snippet, regardless of whether it was deliberate or accidental. Also, make sure that callsites which are used in contexts which don't allow clobbers actually save and restore all clobberable registers. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	d582203578	[PATCH] i386: PARAVIRT: Use patch site IDs computed from offset in paravirt_ops structure Use patch type identifiers derived from the offset of the operation in the paravirt_ops structure. This avoids having to maintain a separate enum for patch site types. Also, since the identifier is derived from the offset into paravirt_ops, the offset can be derived from the identifier. This is used to remove replicated information in the various callsite macros, which has been a source of bugs in the past. This patch also drops the fused save_fl+cli operation, which doesn't really add much and makes things more complex - specifically because it breaks the 1:1 relationship between identifiers and offsets. If this operation turns out to be particularly beneficial, then the right answer is to define a new entrypoint for it. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	98de032b68	[PATCH] i386: PARAVIRT: rename struct paravirt_patch to paravirt_patch_site for clarity Rename struct paravirt_patch to paravirt_patch_site, so that it clearly refers to a callsite, and not the patch which may be applied to that callsite. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Zachary Amsden <zach@vmware.com>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	d6dd61c831	[PATCH] x86: PARAVIRT: add hooks to intercept mm creation and destruction Add hooks to allow a paravirt implementation to track the lifetime of an mm. Paravirtualization requires three hooks, but only two are needed in common code. They are: arch_dup_mmap, which is called when a new mmap is created at fork arch_exit_mmap, which is called when the last process reference to an mm is dropped, which typically happens on exit and exec. The third hook is activate_mm, which is called from the arch-specific activate_mm() macro/function, and so doesn't need stub versions for other architectures. It's called when an mm is first used. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: linux-arch@vger.kernel.org Cc: James Bottomley <James.Bottomley@SteelEye.com> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:14 +02:00
Jeremy Fitzhardinge	5311ab62cd	[PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing Normally when running in PAE mode, the 4th PMD maps the kernel address space, which can be shared among all processes (since they all need the same kernel mappings). Xen, however, does not allow guests to have the kernel pmd shared between page tables, so parameterize pgtable.c to allow both modes of operation. There are several side-effects of this. One is that vmalloc will update the kernel address space mappings, and those updates need to be propagated into all processes if the kernel mappings are not intrinsically shared. In the non-PAE case, this is done by maintaining a pgd_list of all processes; this list is used when all process pagetables must be updated. pgd_list is threaded via otherwise unused entries in the page structure for the pgd, which means that the pgd must be page-sized for this to work. Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE pgd to page aligned anyway, so this patch forces the pgd to be page aligned+sized when the kernel pmd is unshared, to accomodate both these requirements. Also, since there may be several distinct kernel pmds (if the user/kernel split is below 3G), there's no point in allocating them from a slab cache; they're just allocated with get_free_page and initialized appropriately. (Of course the could be cached if there is just a single kernel pmd - which is the default with a 3G user/kernel split - but it doesn't seem worthwhile to add yet another case into this code). [ Many thanks to wli for review comments. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: William Lee Irwin III <wli@holomorphy.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	90caccb975	[PATCH] i386: PARAVIRT: Allocate a fixmap slot Allocate a fixmap slot for use by a paravirt_ops implementation. This is intended for early-boot bootstrap mappings. Once the zones and allocator have been set up, it would be better to use get_vm_area() to allocate some virtual space. Xen uses this to map the hypervisor's shared info page, which doesn't have a pseudo-physical page number, and therefore can't be mapped ordinarily. It is needed early because it contains the vcpu state, including the interrupt mask. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	b239fb2501	[PATCH] i386: PARAVIRT: Hooks to set up initial pagetable This patch introduces paravirt_ops hooks to control how the kernel's initial pagetable is set up. In the case of a native boot, the very early bootstrap code creates a simple non-PAE pagetable to map the kernel and physical memory. When the VM subsystem is initialized, it creates a proper pagetable which respects the PAE mode, large pages, etc. When booting under a hypervisor, there are many possibilities for what paging environment the hypervisor establishes for the guest kernel, so the constructon of the kernel's pagetable depends on the hypervisor. In the case of Xen, the hypervisor boots the kernel with a fully constructed pagetable, which is already using PAE if necessary. Also, Xen requires particular care when constructing pagetables to make sure all pagetables are always mapped read-only. In order to make this easier, kernel's initial pagetable construction has been changed to only allocate and initialize a pagetable page if there's no page already present in the pagetable. This allows the Xen paravirt backend to make a copy of the hypervisor-provided pagetable, allowing the kernel to establish any more mappings it needs while keeping the existing ones. A slightly subtle point which is worth highlighting here is that Xen requires all kernel mappings to share the same pte_t pages between all pagetables, so that updating a kernel page's mapping in one pagetable is reflected in all other pagetables. This makes it possible to allocate a page and attach it to a pagetable without having to explicitly enumerate that page's mapping in all pagetables. And: +From: "Eric W. Biederman" <ebiederm@xmission.com> If we don't set the leaf page table entries it is quite possible that will inherit and incorrect page table entry from the initial boot page table setup in head.S. So we need to redo the effort here, so we pick up PSE, PGE and the like. Hypervisors like Xen require that their page tables be read-only, which is slightly incompatible with our low identity mappings, however I discussed this with Jeremy he has modified the Xen early set_pte function to avoid problems in this area. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: William Irwin <bill.irwin@oracle.com> Cc: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	3dc494e86d	[PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries Add a set of accessors to pack, unpack and modify page table entries (at all levels). This allows a paravirt implementation to control the contents of pgd/pmd/pte entries. For example, Xen uses this to convert the (pseudo-)physical address into a machine address when populating a pagetable entry, and converting back to pphys address when an entry is read. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	4587623360	[PATCH] i386: PARAVIRT: use paravirt_nop to consistently mark no-op operations Add a _paravirt_nop function for use as a stub for no-op operations, and paravirt_nop #defined void * version to make using it easier (since all its uses are as a void ). This is useful to allow the patcher to automatically identify noop operations so it can simply nop out the callsite. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> [mingo] but only as a cleanup of the current open-coded (void ) casts. My problem with this is that it loses the types. Not that there is much to check for, but still, this adds some assumptions about how function calls look like	2007-05-02 19:27:13 +02:00
Rusty Russell	a75c54f933	[PATCH] i386: i386 separate hardware-defined TSS from Linux additions On Thu, 2007-03-29 at 13:16 +0200, Andi Kleen wrote: > Please clean it up properly with two structs. Not sure about this, now I've done it. Running it here. If you like it, I can do x86-64 as well. == lguest defines its own TSS struct because the "struct tss_struct" contains linux-specific additions. Andi asked me to split the struct in processor.h. Unfortunately it makes usage a little awkward. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:13 +02:00
Jeremy Fitzhardinge	d0175ab644	[PATCH] i386: Remove smp_alt_instructions The .smp_altinstructions section and its corresponding symbols are completely unused, so remove them. Also, remove stray #ifdef __KENREL__ in asm-i386/alternative.h Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:13 +02:00
H. Peter Anvin	4bc5aa91fb	[PATCH] x86: Clean up x86 control register and MSR macros (corrected) This patch is based on Rusty's recent cleanup of the EFLAGS-related macros; it extends the same kind of cleanup to control registers and MSRs. It also unifies these between i386 and x86-64; at least with regards to MSRs, the two had definitely gotten out of sync. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Andi Kleen	f039b75471	[PATCH] x86: Don't use MWAIT on AMD Family 10 It doesn't put the CPU into deeper sleep states, so it's better to use the standard idle loop to save power. But allow to reenable it anyways for benchmarking. I also removed the obsolete idle=halt on i386 Cc: andreas.herrmann@amd.com Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	c169859d6d	[PATCH] x86-64: Clean up asm-x86_64/bugs.h Most of asm-x86_64/bugs.h is code which should be in a C file, so put it there. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	1dbf527c51	[PATCH] i386: Make COMPAT_VDSO runtime selectable. Now that relocation of the VDSO for COMPAT_VDSO users is done at runtime rather than compile time, it is possible to enable/disable compat mode at runtime. This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the kernel command line, or via sysctl. (Switching on a running system shouldn't be done lightly; any process which was relying on the compat VDSO will be upset if it goes away.) The COMPAT_VDSO config option still exists, but if enabled it just makes vdso_enabled default to VDSO_COMPAT. +From: Hugh Dickins <hugh@veritas.com> Fix oops from i386-make-compat_vdso-runtime-selectable.patch. Even mingetty at system startup finds it easy to trigger an oops while reading /proc/PID/maps: though it has a good hold on the mm itself, that cannot stop exit_mm() from resetting tsk->mm to NULL. (It is usually show_map()'s call to get_gate_vma() which oopses, and I expect we could change that to check priv->tail_vma instead; but no matter, even m_start()'s call just after get_task_mm() is racy.) Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	d4f7a2c18e	[PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO Some versions of libc can't deal with a VDSO which doesn't have its ELF headers matching its mapped address. COMPAT_VDSO maps the VDSO at a specific system-wide fixed address. Previously this was all done at build time, on the grounds that the fixed VDSO address is always at the top of the address space. However, a hypervisor may reserve some of that address space, pushing the fixmap address down. This patch does the adjustment dynamically at runtime, depending on the runtime location of the VDSO fixmap. [ Patch has been through several hands: Jan Beulich wrote the orignal version; Zach reworked it, and Jeremy converted it to relocate phdrs as well as sections. ] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Zachary Amsden <zach@vmware.com> Cc: "Jan Beulich" <JBeulich@novell.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Roland McGrath <roland@redhat.com>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	a6c4e076ee	[PATCH] i386: clean up identify_cpu identify_cpu() is used to identify both the boot CPU and secondary CPUs, but it performs some actions which only apply to the boot CPU. Those functions are therefore really __init functions, but because they're called by identify_cpu(), they must be marked __cpuinit. This patch splits identify_cpu() into identify_boot_cpu() and identify_secondary_cpu(), and calls the appropriate init functions from each. Also, identify_boot_cpu() and all the functions it dominates are marked __init. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Jeremy Fitzhardinge	1353ebb4b4	[PATCH] i386: Clean up asm-i386/bugs.h Most of asm-i386/bugs.h is code which should be in a C file, so put it there. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-02 19:27:12 +02:00
Avi Kivity	bbf30a1650	[PATCH] x86-64: fix arithmetic in comment The xmm space on x86_64 is 256 bytes. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:12 +02:00
Andi Kleen	5d02d7ae73	[PATCH] x86-64: Use X86_EFLAGS_IF in x86-64/irqflags.h. As per i386 patch: move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we can include it from irqflags.h and use it in raw_irqs_disabled_flags(). As a side-effect, we could now use these flags in .S files. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:11 +02:00
Jan Beulich	b92e9fac40	[PATCH] x86: fix amd64-agp aperture validation Under CONFIG_DISCONTIGMEM, assuming that a !pfn_valid() implies all subsequent pfn-s are also invalid is wrong. Thus replace this by explicitly checking against the E820 map. AK: make e820 on x86-64 not initdata Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>	2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge	b00742d399	[PATCH] x86-64: Account for module percpu space separately from kernel percpu Rather than using a single constant PERCPU_ENOUGH_ROOM, compute it as the sum of kernel_percpu + PERCPU_MODULE_RESERVE. This is now common to all architectures; if an architecture wants to set PERCPU_ENOUGH_ROOM to something special, then it may do so (ia64 is the only one which does). Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de>	2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge	07f3331c6b	[PATCH] i386: Add machine_ops interface to abstract halting and rebooting machine_ops is an interface for the machine_* functions defined in <linux/reboot.h>. This is intended to allow hypervisors to intercept the reboot process, but it could be used to implement other x86 subarchtecture reboots. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:11 +02:00
Jeremy Fitzhardinge	01a2f43556	[PATCH] i386: Add smp_ops interface Add a smp_ops interface. This abstracts the API defined by <linux/smp.h> for use within arch/i386. The primary intent is that it be used by a paravirtualizing hypervisor to implement SMP, but it could also be used by non-APIC-using sub-architectures. This is related to CONFIG_PARAVIRT, but is implemented unconditionally since it is simpler that way and not a highly performance-sensitive interface. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>	2007-05-02 19:27:11 +02:00
Rusty Russell	4fbb596881	[PATCH] i386: cleanup GDT Access Now we have an explicit per-cpu GDT variable, we don't need to keep the descriptors around to use them to find the GDT: expose cpu_gdt directly. We could go further and make load_gdt() pack the descriptor for us, or even assume it means "load the current cpu's GDT" which is what it always does. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:11 +02:00
Adrian Bunk	ca906e4231	[PATCH] x86: sys_ioperm() prototype cleanup - there's no reason for duplicating the prototype from include/linux/syscalls.h in include/asm-x86_64/unistd.h - every file should #include the headers containing the prototypes for it's global functions Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Christoph Lameter	2bff73830c	[PATCH] x86-64: use lru instead of page->index and page->private for pgd lists management. x86_64 currently simulates a list using the index and private fields of the page struct. Seems that the code was inherited from i386. But x86_64 does not use the slab to allocate pgds and pmds etc. So the lru field is not used by the slab and therefore available. This patch uses standard list operations on page->lru to realize pgd tracking. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Andi Kleen	b4531e863d	[PATCH] i386: Use X86_EFLAGS_IF in irqflags.h. Move X86_EFLAGS_IF et al out to a new header: processor-flags.h, so we can include it from irqflags.h and use it in raw_irqs_disabled_flags(). As a side-effect, we could now use these flags in .S files. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Jan Beulich	6fb14755a6	[PATCH] x86: tighten kernel image page access rights On x86-64, kernel memory freed after init can be entirely unmapped instead of just getting 'poisoned' by overwriting with a debug pattern. On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table can also be write-protected. Compared to the first version, this one prevents re-creating deleted mappings in the kernel image range on x86-64, if those got removed previously. This, together with the original changes, prevents temporarily having inconsistent mappings when cacheability attributes are being changed on such pages (e.g. from AGP code). While on i386 such duplicate mappings don't exist, the same change is done there, too, both for consistency and because checking pte_present() before using various other pte_XXX functions is a requirement anyway. At once, i386 code gets adjusted to use pte_huge() instead of open coding this. AK: split out cpa() changes Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Jan Beulich	d01ad8dd56	[PATCH] x86: Improve handling of kernel mappings in change_page_attr Fix various broken corner cases in i386 and x86-64 change_page_attr. AK: split off from tighten kernel image access rights Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:10 +02:00
Rusty Russell	90a0a06aa8	[PATCH] i386: rationalize paravirt wrappers paravirt.c used to implement native versions of all low-level functions. Far cleaner is to have the native versions exposed in the headers and as inline native_XXX, and if !CONFIG_PARAVIRT, then simply #define XXX native_XXX. There are several nice side effects: 1) write_dt_entry() now takes the correct "struct Xgt_desc_struct " not "void ". 2) load_TLS is reintroduced to the for loop, not manually unrolled with a #error in case the bounds ever change. 3) Macros become inlines, with type checking. 4) Access to the native versions is trivial for KVM, lguest, Xen and others who might want it. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@muc.de> Cc: Avi Kivity <avi@qumranet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Rusty Russell	d2cbcc49e2	[PATCH] i386: clean up cpu_init() We now have cpu_init() and secondary_cpu_init() doing nothing but calling _cpu_init() with the same arguments. Rename _cpu_init() to cpu_init() and use it as a replcement for secondary_cpu_init(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Rusty Russell	bf50467204	[PATCH] i386: Use per-cpu GDT immediately upon boot Now we are no longer dynamically allocating the GDT, we don't need the "cpu_gdt_table" at all: we can switch straight from "boot_gdt_table" to the per-cpu GDT. This means initializing the cpu_gdt array in C. The boot CPU uses the per-cpu var directly, then in smp_prepare_cpus() it switches to the per-cpu copy just allocated. For secondary CPUs, the early_gdt_descr is set to point directly to their per-cpu copy. For UP the code is very simple: it keeps using the "per-cpu" GDT as per SMP, but we never have to move. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Rusty Russell	ae1ee11be7	[PATCH] i386: Use per-cpu variables for GDT, PDA Allocating PDA and GDT at boot is a pain. Using simple per-cpu variables adds happiness (although we need the GDT page-aligned for Xen, which we do in a followup patch). [akpm@linux-foundation.org: build fix] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:10 +02:00
Ian Campbell	79e030114a	[PATCH] i386: Allow i386 crash kernels to handle x86_64 dumps The specific case I am encountering is kdump under Xen with a 64 bit hypervisor and 32 bit kernel/userspace. The dump created is 64 bit due to the hypervisor but the dump kernel is 32 bit for maximum compatibility. It's possibly less likely to be useful in a purely native scenario but I see no reason to disallow it. [akpm@linux-foundation.org: build fix] Signed-off-by: Ian Campbell <ian.campbell@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Cc: Horms <horms@verge.net.au> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
Rusty Russell	eab0c72aec	[PATCH] x86-64: Introduce load_TLS to the "for" loop. GCC (4.1 at least) unrolls it anyway, but I can't believe this code was ever justifiable. (I've also submitted a patch which cleans up i386, which is even uglier). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
Rusty Russell	692174b97d	[PATCH] i386: Initialize esp0 properly all the time Whenever we schedule, __switch_to calls load_esp0 which does: tss->esp0 = thread->esp0; This is never initialized for the initial thread (ie "swapper"), so when we're scheduling that, we end up setting esp0 to 0. This is fine: the swapper never leaves ring 0, so this field is never used. lguest, however, gets upset that we're trying to used an unmapped page as our kernel stack. Rather than work around it there, let's initialize it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
David Rientjes	8b8ca80e19	[PATCH] x86-64: configurable fake numa node sizes Extends the numa=fake x86_64 command-line option to allow for configurable node sizes. These nodes can be used in conjunction with cpusets for coarse memory resource management. The old command-line option is still supported: numa=fake=32 gives 32 fake NUMA nodes, ignoring the NUMA setup of the actual machine. But now you may configure your system for the node sizes of your choice: numa=fake=2512,1024,2256 gives two 512M nodes, one 1024M node, two 256M nodes, and the rest of system memory to a sixth node. The existing hash function is maintained to support the various node sizes that are possible with this implementation. Each node of the same size receives roughly the same amount of available pages, regardless of any reserved memory with its address range. The total available pages on the system is calculated and divided by the number of equal nodes to allocate. These nodes are then dynamically allocated and their borders extended until such time as their number of available pages reaches the required size. Configurable node sizes are recommended when used in conjunction with cpusets for memory control because it eliminates the overhead associated with scanning the zonelists of many smaller full nodes on page_alloc(). Cc: Andi Kleen <ak@suse.de> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:09 +02:00
john stultz	5a90cf205c	[PATCH] x86: Log reason why TSC was marked unstable Change mark_tsc_unstable() so it takes a string argument, which holds the reason the TSC was marked unstable. This is then displayed the first time mark_tsc_unstable is called. This should help us better debug why the TSC was marked unstable on certain systems and allow us to make sure we're not being overly paranoid when throwing out this troublesome clocksource. Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:08 +02:00
Vivek Goyal	1833d6bc72	[PATCH] i386: modpost apic related warning fixes o Modpost generates warnings for i386 if compiled with CONFIG_RELOCATABLE=y WARNING: vmlinux - Section mismatch: reference to .init.text:find_unisys_acpi_oem_table from .text between 'acpi_madt_oem_check' (at offset 0xc0101eda) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:acpi_get_table_header_early from .text between 'acpi_madt_oem_check' (at offset 0xc0101ef0) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'acpi_madt_oem_check' (at offset 0xc0101f2e) and 'enable_apic_mode' WARNING: vmlinux - Section mismatch: reference to .init.text:setup_unisys from .text between 'acpi_madt_oem_check' (at offset 0xc0101f37) and 'enable_apic_mode'WARNING: vmlinux - Section mismatch: reference to .init.text:parse_unisys_oem from .text between 'mps_oem_check' (at offset 0xc0101ec7) and 'acpi_madt_oem_check' WARNING: vmlinux - Section mismatch: reference to .init.text:es7000_sw_apic from .text between 'enable_apic_mode' (at offset 0xc0101f48) and 'check_apicid_present' o Some functions which are inline (acpi_madt_oem_check) are not inlined by compiler as these functions are accessed using function pointer. These functions are put in .text section and they in-turn access __init type functions hence modpost generates warnings. o Do not iniline acpi_madt_oem_check, instead make it __init. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:08 +02:00
Ravikiran G Thirumalai	e073ae1b34	[PATCH] x86-64: Set HASHDIST_DEFAULT to 1 for x86_64 NUMA Enable system hashtable memory to be distributed among nodes on x86_64 NUMA Forcing the kernel to use node interleaved vmalloc instead of bootmem for the system hashtable memory (alloc_large_system_hash) reduces the memory imbalance on node 0 by around 40MB on a 8 node x86_64 NUMA box: Before the following patch, on bootup of a 8 node box: Node 0 MemTotal: 3407488 kB Node 0 MemFree: 3206296 kB Node 0 MemUsed: 201192 kB Node 0 Active: 7012 kB Node 0 Inactive: 512 kB Node 0 Dirty: 0 kB Node 0 Writeback: 0 kB Node 0 FilePages: 1912 kB Node 0 Mapped: 420 kB Node 0 AnonPages: 5612 kB Node 0 PageTables: 468 kB Node 0 NFS_Unstable: 0 kB Node 0 Bounce: 0 kB Node 0 Slab: 5408 kB Node 0 SReclaimable: 644 kB Node 0 SUnreclaim: 4764 kB After the patch (or using hashdist=1 on the kernel command line): Node 0 MemTotal: 3407488 kB Node 0 MemFree: 3247608 kB Node 0 MemUsed: 159880 kB Node 0 Active: 3012 kB Node 0 Inactive: 616 kB Node 0 Dirty: 0 kB Node 0 Writeback: 0 kB Node 0 FilePages: 2424 kB Node 0 Mapped: 380 kB Node 0 AnonPages: 1200 kB Node 0 PageTables: 396 kB Node 0 NFS_Unstable: 0 kB Node 0 Bounce: 0 kB Node 0 Slab: 6304 kB Node 0 SReclaimable: 1596 kB Node 0 SUnreclaim: 4708 kB I guess it is a good idea to keep HASHDIST_DEFAULT "on" for x86_64 NUMA since x86_64 has no dearth of vmalloc space? Or maybe enable hash distribution for all 64bit NUMA arches? The following patch does it only for x86_64. I ran a HPC MPI benchmark -- 'Ansys wingsolid', which takes up quite a bit of memory and uses up tlb entries. This was on a 4 way, 2 socket Tyan AMD box (non vsmp), with 8G total memory (4G pernode). The results with and without hash distribution are: 1. Vanilla - runtime of 1188.000s 2. With hashdist=1 runtime of 1154.000s Oprofile output for the duration of run is: 1. Vanilla: PU: AMD64 processors, speed 2411.16 MHz (estimated) Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 500 samples % app name symbol name 163054 6.5513 libansys1.so MultiFront::decompose(int, int, Elemset , int , int, int, int) 162061 6.5114 libansys3.so blockSaxpy6L_fd 162042 6.5107 libansys3.so blockInnerProduct6L_fd 156286 6.2794 libansys3.so maxb33_ 87879 3.5309 libansys1.so elmatrixmultpcg_ 84857 3.4095 libansys4.so saxpy_pcg 58637 2.3560 libansys4.so .st4560 46612 1.8728 libansys4.so .st4282 43043 1.7294 vmlinux-t copy_user_generic_string 41326 1.6604 libansys3.so blockSaxpyBackSolve6L_fd 41288 1.6589 libansys3.so blockInnerProductBackSolve6L_fd 2. With hashdist=1 CPU: AMD64 processors, speed 2411.13 MHz (estimated) Counted L1_AND_L2_DTLB_MISSES events (L1 and L2 DTLB misses) with a unit mask of 0x00 (No unit mask) count 500 samples % app name symbol name 162993 6.9814 libansys1.so MultiFront::decompose(int, int, Elemset , int , int, int, int) 160799 6.8874 libansys3.so blockInnerProduct6L_fd 160459 6.8729 libansys3.so blockSaxpy6L_fd 156018 6.6826 libansys3.so maxb33_ 84700 3.6279 libansys4.so saxpy_pcg 83434 3.5737 libansys1.so elmatrixmultpcg_ 58074 2.4875 libansys4.so .st4560 46000 1.9703 libansys4.so .st4282 41166 1.7632 libansys3.so blockSaxpyBackSolve6L_fd 41033 1.7575 libansys3.so blockInnerProductBackSolve6L_fd 35762 1.5318 libansys1.so inner_product_sub 35591 1.5245 libansys1.so inner_product_sub2 28259 1.2104 libansys4.so addVectors Signed-off-by: Pravin B. Shelar <pravin.shelar@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Acked-by: Christoph Lameter <clameter@engr.sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:08 +02:00
Andrew Morton	184c44d204	[PATCH] x86-64: fix x86_64-mm-sched-clock-share Fix for the following patch. Provide dummy cpufreq functions when CPUFREQ is not compiled in. Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:08 +02:00
Vivek Goyal	6a50a664ca	[PATCH] x86-64: build-time checking o X86_64 kernel should run from 2MB aligned address for two reasons. - Performance. - For relocatable kernels, page tables are updated based on difference between compile time address and load time physical address. This difference should be multiple of 2MB as kernel text and data is mapped using 2MB pages and PMD should be pointing to a 2MB aligned address. Life is simpler if both compile time and load time kernel addresses are 2MB aligned. o Flag the error at compile time if one is trying to build a kernel which does not meet alignment restrictions. Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:08 +02:00
Vivek Goyal	1ab60e0f72	[PATCH] x86-64: Relocatable Kernel Support This patch modifies the x86_64 kernel so that it can be loaded and run at any 2M aligned address, below 512G. The technique used is to compile the decompressor with -fPIC and modify it so the decompressor is fully relocatable. For the main kernel the page tables are modified so the kernel remains at the same virtual address. In addition a variable phys_base is kept that holds the physical address the kernel is loaded at. __pa_symbol is modified to add that when we take the address of a kernel symbol. When loaded with a normal bootloader the decompressor will decompress the kernel to 2M and it will run there. This both ensures the relocation code is always working, and makes it easier to use 2M pages for the kernel and the cpu. AK: changed to not make RELOCATABLE default in Kconfig Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	0dbf7028c0	[PATCH] x86: __pa and __pa_symbol address space separation Currently __pa_symbol is for use with symbols in the kernel address map and __pa is for use with pointers into the physical memory map. But the code is implemented so you can usually interchange the two. __pa which is much more common can be implemented much more cheaply if it is it doesn't have to worry about any other kernel address spaces. This is especially true with a relocatable kernel as __pa_symbol needs to peform an extra variable read to resolve the address. There is a third macro that is added for the vsyscall data __pa_vsymbol for finding the physical addesses of vsyscall pages. Most of this patch is simply sorting through the references to __pa or __pa_symbol and using the proper one. A little of it is continuing to use a physical address when we have it instead of recalculating it several times. swapper_pgd is now NULL. leave_mm now uses init_mm.pgd and init_mm.pgd is initialized at boot (instead of compile time) to the physmem virtual mapping of init_level4_pgd. The physical address changed. Except for the for EMPTY_ZERO page all of the remaining references to __pa_symbol appear to be during kernel initialization. So this should reduce the cost of __pa in the common case, even on a relocated kernel. As this is technically a semantic change we need to be on the lookout for anything I missed. But it works for me (tm). Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	cfd243d4af	[PATCH] x86-64: Remove the identity mapping as early as possible With the rewrite of the SMP trampoline and the early page allocator there is nothing that needs identity mapped pages, once we start executing C code. So add zap_identity_mappings into head64.c and remove zap_low_mappings() from much later in the code. The functions are subtly different thus the name change. This also kills boot_level4_pgt which was from an earlier attempt to move the identity mappings as early as possible, and is now no longer needed. Essentially I have replaced boot_level4_pgt with trampoline_level4_pgt in trampoline.S Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	7db681d7e4	[PATCH] x86-64: wakeup.S rename registers to reflect right names o Use appropriate names for 64bit regsiters. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	3c321bceb4	[PATCH] x86-64: Add EFER to the register set saved by save_processor_state EFER varies like %cr4 depending on the cpu capabilities, and which cpu capabilities we want to make use of. So save/restore it make certain we have the same EFER value when we are done. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	30f4728954	[PATCH] x86-64: cleanup segments Move __KERNEL32_CS up into the unused gdt entry. __KERNEL32_CS is used when entering the kernel so putting it first is useful when trying to keep boot gdt sizes to a minimum. Set the accessed bit on all gdt entries. We don't care so there is no need for the cpu to burn the extra cycles, and it potentially allows the pages to be immutable. Plus it is confusing when debugging and your gdt entries mysteriously change. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:07 +02:00
Vivek Goyal	67dcbb6bc6	[PATCH] x86-64: Clean up the early boot page table - Merge physmem_pgt and ident_pgt, removing physmem_pgt. The merge is broken as soon as mm/init.c:init_memory_mapping is run. - As physmem_pgt is gone don't export it in pgtable.h. - Use defines from pgtable.h for page permissions. - Fix the physical memory identity mapping so it is at the correct address. - Remove the physical memory mapping from wakeup_level4_pgt it is at the wrong address so we can't possibly be usinging it. - Simply NEXT_PAGE the work to calculate the phys_ alias of the labels was very cool. Unfortuantely it was a brittle special purpose hack that makes maitenance more difficult. Instead just use label - __START_KERNEL_map like we do everywhere else in assembly. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Vivek Goyal	9d291e787b	[PATCH] x86-64: Assembly safe page.h and pgtable.h This patch makes pgtable.h and page.h safe to include in assembly files like head.S. Allowing us to use symbolic constants instead of hard coded numbers when refering to the page tables. This patch copies asm-sparc64/const.h to asm-x86_64 to get a definition of _AC() a very convinient macro that allows us to force the type when we are compiling the code in C and to drop all of the type information when we are using the constant in assembly. Previously this was done with multiple definition of the same constant. const.h was modified slightly so that it works when given CONFIG options as arguments. This patch adds #ifndef __ASSEMBLY__ ... #endif and _AC(1,UL) where appropriate so the assembler won't choke on the header files. Otherwise nothing should have changed. AK: added const.h to exported headers to fix headers_check Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Stephen Hemminger	e658450455	[PATCH] x86-64: dma_ops as const The dma_ops structure can be const since it never changes after boot. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Joerg Roedel	6b37f5a20c	[PATCH] x86-64: fix cpu MHz reporting on constant_tsc cpus This patch fixes the reporting of cpu_mhz in /proc/cpuinfo on CPUs with a constant TSC rate and a kernel with disabled cpufreq. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> arch/x86_64/kernel/apic.c \| 2 - arch/x86_64/kernel/time.c \| 58 +++++++++++++++++++++++++++++++++++++++--- arch/x86_64/kernel/tsc.c \| 12 +++++--- arch/x86_64/kernel/tsc_sync.c \| 2 - include/asm-x86_64/proto.h \| 1 5 files changed, 65 insertions(+), 10 deletions(-)	2007-05-02 19:27:06 +02:00
Glauber de Oliveira Costa	fbc16f2c2a	[PATCH] x86-64: Remove duplicated code for reading control registers On Tue, Mar 13, 2007 at 05:33:09AM -0700, Randy.Dunlap wrote: > On Tue, 13 Mar 2007, Glauber de Oliveira Costa wrote: > > > Tiny cleanup: > > > > In x86_64, the same functions for reading cr3 and writing cr{3,4} are > > defined in tlbflush.h and system.h, whith just a name change. > > The only difference is the clobbering of memory, which seems a safe, and > > even needed change for the write_cr4. This patch removes the duplicate. > > write_cr3() is moved to system.h for consistency. > > missing patch..... > thanks. Attached now -- Glauber de Oliveira Costa Red Hat Inc. "Free as in Freedom" Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Rusty Russell	f9d09645d6	[PATCH] x86-64: Remove unused set_seg_base The set_seg_base function isn't used anywhere (2.6.21-rc3-git1) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Jeremy Fitzhardinge	973efae21b	[PATCH] i386: clean up mach_reboot_fixups The reboot_fixups stuff seems to be a bit of a mess, specifically the header is in linux/ when its a purely i386-specific piece of code. I'm not sure why it has its config option; its only currently needed for "geode-gx1/cs5530a", so perhaps whatever config option controls that hardware should enable this? Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Aneesh Kumar K.V	6d1c426158	[PATCH] i386: Update __copy_to_user_inatomic linuxdoc description Explicity specify that the caller should pin the user memory otherwise the function will sleep Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@gmail.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:06 +02:00
Prarit Bhargava	86c0baf123	[PATCH] i386: Change sysenter_setup to __cpuinit & improve __INIT, __INITDATA Change sysenter_setup to __cpuinit. Change __INIT & __INITDATA to be cpu hotplug aware. Resolve MODPOST warnings similar to: WARNING: vmlinux - Section mismatch: reference to .init.text:sysenter_setup from .text between 'identify_cpu' (at offset 0xc040a380) and 'detect_ht' and WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_end from .text between 'sysenter_setup' (at offset 0xc041a269) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_int80_start from .text between 'sysenter_setup' (at offset 0xc041a26e) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_sysenter_end from .text between 'sysenter_setup' (at offset 0xc041a275) and 'enable_sep_cpu' WARNING: vmlinux - Section mismatch: reference to .init.data:vsyscall_sysenter_start from .text between 'sysenter_setup' (at offset 0xc041a27a) and 'enable_sep_cpu' Signed-off-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Andi Kleen	803d80f650	[PATCH] x86-64: Some cleanup in time.c Move prototypes into header files Remove unneeded includes. Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Simon Arlott	0949be3509	[PATCH] i386: Add an option for the VIA C7 which sets appropriate L1 cache The VIA C7 is a 686 (with TSC) that supports MMX, SSE and SSE2, it also has a cache line length of 64 according to http://www.digit-life.com/articles2/cpu/rmma-via-c7.html. This patch sets gcc to -march=686 and select s the correct cache shift. Signed-off-by: Simon Arlott <simon@fire.lp0.eu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-05-02 19:27:05 +02:00
takada	f5e8861583	[PATCH] i386: pit_latch_buggy has no effect Eliminated the arch/i386/kernel/timers in 2.6.18, use clocksoures instead. pit_latch_buggy was referred in timers/timer_tsc.c, and currently removed. Therefore nobody refer it. Until 2.6.17, MediaGX's TSC works correctly. after 2.6.18, warned "TSC appears to be running slowly. Marking it as unstable". So marked unstable TSC when CS55x0. Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Jeremy Fitzhardinge	f76c392380	[PATCH] i386: No need to use -traditional for processing asm in i386/kernel/ No need to use -traditional for processing asm in i386/kernel/ Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Jan Beulich	9964cf7d77	[PATCH] x86: consolidate smp_send_stop() Synchronize i386's smp_send_stop() with x86-64's in only try-locking the call lock to prevent deadlocks when called from panic(). In both version, disable interrupts before clearing the CPU off the online map to eliminate races with IRQ handlers inspecting this map. Also in both versions, save/restore interrupts rather than disabling/ enabling them. On x86-64, eliminate one function used here by folding it into its single caller, convert to static, and rename for consistency with i386 (lkcd may like this). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:05 +02:00
Jan Beulich	b0354795c9	[PATCH] x86-64: adjust inclusion of asm/vsyscall32.h Avoid including asm/vsyscall32.h in virtually every source file. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Jan Beulich	00f1ea6967	[PATCH] x86: adjust inclusion of asm/fixmap.h Move inclusion of asm/fixmap.h to where it is really used rather than where it may have been used long ago (requires a few other adjustments to includes due to previous implicit dependencies). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Ingo Molnar	3c43f03908	[PATCH] x86: default to physical mode on hotplug CPU kernels Default to physical mode on hotplug CPU kernels. Furher simplify and clean up the APIC initialization code. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-05-02 19:27:04 +02:00
Ingo Molnar	07c7c47444	[PATCH] x86-64: always use physical delivery mode on > 8 CPUs Remove clustered APIC mode. There's little point in the use of clustered APIC mode, broadcasting is limited to within the cluster only, and chipsets have bugs in this area as well. So default to physical APIC mode when the CPU count is large, and default to logical APIC mode when the CPU count is 8 or smaller. (this patch only removes the use of genapic_cluster and cleans up the resulting genapic.c file - removal of all remaining traces of clustered mode will be done by another patch.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org>	2007-05-02 19:27:04 +02:00
Andrew Morton	a86f34b49f	[PATCH] x86: revert x86_64-mm-fix-the-irqbalance-quirk-for-e7320-e7520-e7525 Obsoleted by Ingo's genapic stuff. Cc: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Andrew Morton	3dc68d9b58	[PATCH] x86-64: revert x86_64-mm-add-genapic_force This is obsoleted by new Ingo genapic patches. Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "Li, Shaohua" <shaohua.li@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>	2007-05-02 19:27:04 +02:00
Herbert Xu	e196d62591	[CRYPTO] api: Add ablkcipher_request_set_tfm This patch adds ablkcipher_request_set_tfm for those users that need to manage the memory for ablkcipher requests directly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:33 +10:00
Herbert Xu	124b53d020	[CRYPTO] cryptd: Add software async crypto daemon This patch adds the cryptd module which is a template that takes a synchronous software crypto algorithm and converts it to an asynchronous one by executing it in a kernel thread. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:32 +10:00
Herbert Xu	a73e69965f	[CRYPTO] api: Do not remove users unless new algorithm matches As it is whenever a new algorithm with the same name is registered users of the old algorithm will be removed so that they can take advantage of the new algorithm. This presents a problem when the new algorithm is not equivalent to the old algorithm. In particular, the new algorithm might only function on top of the existing one. Hence we should not remove users unless they can make use of the new algorithm. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:32 +10:00
Herbert Xu	b5b7f08869	[CRYPTO] api: Add async blkcipher type This patch adds the mid-level interface for asynchronous block ciphers. It also includes a generic queueing mechanism that can be used by other asynchronous crypto operations in future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:31 +10:00
Herbert Xu	ebc610e5bc	[CRYPTO] templates: Pass type/mask when creating instances This patch passes the type/mask along when constructing instances of templates. This is in preparation for templates that may support multiple types of instances depending on what is requested. For example, the planned software async crypto driver will use this construct. For the moment this allows us to check whether the instance constructed is of the correct type and avoid returning success if the type does not match. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:31 +10:00
Herbert Xu	32e3983fe5	[CRYPTO] api: Add async block cipher interface This patch adds the frontend interface for asynchronous block ciphers. In addition to the usual block cipher parameters, there is a callback function pointer and a data pointer. The callback will be invoked only if the encrypt/decrypt handlers return -EINPROGRESS. In other words, if the return value of zero the completion handler (or the equivalent code) needs to be invoked by the caller. The request structure is allocated and freed by the caller. Its size is determined by calling crypto_ablkcipher_reqsize(). The helpers ablkcipher_request_alloc/ablkcipher_request_free can be used to manage the memory for a request. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2007-05-02 14:38:30 +10:00
Haavard Skinnemoen	1c23af90dc	i2c: Bitbanging I2C bus driver using the GPIO API This is a very simple bitbanging I2C bus driver utilizing the new arch-neutral GPIO API. Useful for chips that don't have a built-in I2C controller, additional I2C busses, or testing purposes. To use, include something similar to the following in the board-specific setup code: #include <linux/i2c-gpio.h> static struct i2c_gpio_platform_data i2c_gpio_data = { .sda_pin = GPIO_PIN_FOO, .scl_pin = GPIO_PIN_BAR, }; static struct platform_device i2c_gpio_device = { .name = "i2c-gpio", .id = 0, .dev = { .platform_data = &i2c_gpio_data, }, }; Register this platform_device, set up the I2C pins as GPIO if required and you're ready to go. This will use default values for udelay and timeout, and will work with GPIO hardware that does not support open drain mode, but allows sensing of the SDA and SCL lines even when they are being driven. Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:34 +02:00
Jean Delvare	b86a1bc8e3	i2c: Restore i2c_smbus_read_block_data Add back the i2c_smbus_read_block_data helper function, it is needed by the upcoming lm93 hardware monitoring driver and possibly others. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:34 +02:00
Jean Delvare	424ed67c7d	i2c-algo-bit: Implement a 50/50 SCL duty cycle The original i2c-algo-bit implementation uses a 33/66 SCL duty cycle when bits are being written on the bus. While the I2C specification doesn't forbid it, this prevents us from driving the I2C bus to its max speed, limiting us to 66 kbps max on standard I2C busses. Implementing a 50/50 duty cycle instead lets us max out the bandwidth up to the theoretical max of 100 kbps on standard I2C busses. This is particularly important when large amounts of data need to be transfered over the bus, as is the case with some TV adapters when the firmware is being uploaded. In fact this change even allows, at least in theory, fast-mode I2C support at 125, 166 and 250 kbps. There's no way to reach the theoretical max of 400 kbps with this implementation. But I don't think we want to put efforts in that direction anyway: software-driven I2C is very CPU-intensive and bad for latency. Other timing changes: * Don't set SDA high explicitly on error, we're going to issue a stop condition before we leave anyway. * If an error occurs when sending the slave address, yield the CPU before retrying, and remove the additional delay after the new start condition. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:33 +02:00
Bryan Wu	d24ecfcc39	i2c: Blackfin Two Wire Interface driver The i2c linux driver for blackfin architecture which supports blackfin on-chip TWI controller i2c operation. Signed-off-by: Bryan Wu <bryan.wu@analog.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:32 +02:00
Jean Delvare	b3e820968a	i2c: Make i2c_del_driver a void function Make i2c_del_driver a void function, like all other driver removal functions. It always returned 0 even when errors occured, and nobody ever actually checked the return value anyway. And we cannot fail a module removal anyway. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:32 +02:00
Jean Delvare	a97f1ed090	i2c: Move i2c-isa-only exported symbol declarations Move the declaration of i2c-isa-only exported symbols to i2c-isa itself, that's the best way to ensure nobody will attempt to use them. Hopefully we'll get rid of the exports themselves soon anyway. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:32 +02:00
Jean Delvare	12b5053ac5	i2c: Add i2c_new_probed_device() Add a new helper function to instantiate an i2c device. It is meant as a replacement for i2c_new_device() when you don't know for sure at which address your I2C/SMBus device lives. This happens frequently on TV adapters for example, you know there is a tuner chip on the bus, but depending on the exact board model and revision, it can live at different addresses. So, the new i2c_new_probed_device() function will probe the bus according to a list of addresses, and as soon as one of these addresses responds, it will call i2c_new_device() on that one address. This function will make it possible to port the old i2c drivers to the new model quickly. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
Jean Delvare	0f3b483852	i2c-algo-bit: Add i2c_bit_add_numbered_bus Add i2c_bit_add_numbered_bus(), which is equivalent to i2c_bit_add_bus except that it calls i2c_add_numbered_adapter() at the end instead of i2c_add_adapter(). Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
David Brownell	6e13e64184	i2c: Add i2c_add_numbered_adapter() This adds a call, i2c_add_numbered_adapter(), registering an I2C adapter with a specific bus number and then creating I2C device nodes for any pre-declared devices on that bus. It builds on previous patches adding I2C probe() and remove() support, and that pre-declaration of devices. This completes the core support for "new style" I2C device drivers. Those follow the standard driver model for binding devices to drivers (using probe and remove methods) rather than a legacy model (where the driver tries to autoconfigure each bus, and registers devices itself). Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
David Brownell	9c1600eda4	i2c: Add i2c_board_info and i2c_new_device() This provides partial support for new-style I2C driver binding. It builds on "struct i2c_board_info" declarations that identify I2C devices on a given board. This is needed on systems with I2C devices that can't be fully probed and/or autoconfigured, such as many embedded Linux configurations where the way a given I2C device is wired may affect how it must be used. There are two models for declaring such devices: * LATE -- using a public function i2c_new_device(). This lets modules declare I2C devices found AFTER a given I2C adapter becomes available. For example, a PCI card could create adapters giving access to utility chips on that card, and this would be used to associate those chips with those adapters. * EARLY -- from arch_initcall() level code, using a non-exported function i2c_register_board_info(). This copies the declarations BEFORE such an i2c_adapter becomes available, arranging that i2c_new_device() will be called later when i2c-core registers the relevant i2c_adapter. For example, arch/.../.../board-*.c files would declare the I2C devices along with their platform data, and I2C devices would behave much like PNPACPI devices. (That is, both enumerate from board-specific tables.) To match the exported i2c_new_device(), the previously-private function i2c_unregister_device() is now exported. Pending later patches using these new APIs, this is effectively a NOP. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:31 +02:00
David Brownell	a1d9e6e49f	i2c: i2c stack can remove() More update for new style driver support: add a remove() method, and use it in the relevant code paths. Again, nothing will use this yet since there's nothing to create devices feeding this infrastructure. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:30 +02:00
David Brownell	7b4fbc50fa	i2c: i2c stack can probe() One of a series of I2C infrastructure updates to support enumeration using the standard Linux driver model. This patch updates probe() and associated hotplug/coldplug support, but not remove(). Nothing yet _uses_ it to create I2C devices, so those hotplug/coldplug mechanisms will be the only externally visible change. This patch will be an overall NOP since the I2C stack doesn't yet create clients/devices except as part of binding them to legacy drivers. Some code is moved earlier in the source code, helping group more of the per-device infrastructure in one place and simplifying handling per-device attributes. Terminology being adopted: "legacy drivers" create devices (i2c_client) themselves, while "new style" ones follow the driver model (the i2c_client is handed to the probe routine). It's an either/or thing; the two models don't mix, and drivers that try mixing them won't even be registered. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:30 +02:00
Jean Delvare	7c59b6615f	i2c: Cleanup the includes of <linux/i2c.h> Clean up the includes of <linux/i2c.h>. Only include this header file when we actually need it. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:29 +02:00
Jean Delvare	f75803de6a	i2c-nforce2: Add support for the MCP61 and MCP65 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Hans-Frieder Vogt <hfvogt@gmx.net>	2007-05-01 23:26:29 +02:00
Jean Delvare	209d27c3b1	i2c: Emulate SMBus block read over I2C Let the I2C bus drivers emulate the SMBus Block Read and Block Process Call transactions if they wish. This requires to define a new message flag, which i2c-core will use to let the underlying I2C bus driver know that the first received byte will specify the length of the read message. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:29 +02:00
David Brownell	ef2c8321f5	i2c: Rename dev_to_i2c_adapter() Rename dev_to_i2c_adapter() as to_i2c_adapter(), since the previous syntax was a surprising and needless difference from normal naming conventions in Linux. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:28 +02:00
David Brownell	2096b956d2	i2c: Shrink struct i2c_client This shrinks the size of "struct i2c_client" by 40 bytes: - Substantially shrinks the string used to identify the chip type - The "flags" don't need to be so big - Removes some internal padding It also adds kerneldoc for that struct, explaining how "name" is really a chip type identifier; it's otherwise potentially confusing. Because the I2C_NAME_SIZE symbol was abused for both i2c_client.name and for i2c_adapter.name, this needed to affect i2c_adapter too. The adapters which used that symbol now use the more-obviously-correct idiom of taking the size of that field. JD: Shorten i2c_adapter.name from 50 to 48 bytes while we're here, to avoid wasting space in padding. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:28 +02:00
Jean Delvare	b31366f439	i2c: i2c_adapter devices need no driver Kill i2c_adapter_driver as it doesn't make sense and it prevents further i2c-core cleanups. i2c_adapter devices are virtual devices (ex-class devices) and as such they don't need a driver. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:28 +02:00
Jean Delvare	fccb56e4d8	i2c: Kill i2c_adapter.class_dev Kill i2c_adapter.class_dev. Instead, set the class of i2c_adapter.dev to i2c_adapter_class, so that a symlink will be created for every i2c_adapter in /sys/class/i2c-adapter. The same change must be mirrored to i2c-isa as it duplicates some of the i2c-core functionalities. User-space tools and libraries might need some adjustments. In particular, libsensors from lm_sensors 2.10.3 or later is required for proper discovery of i2c adapter names after this change. Signed-off-by: Jean Delvare <khali@linux-fr.org>	2007-05-01 23:26:27 +02:00
Christoph Hellwig	f3402a4e52	[VOYAGER] Convert the monitor thread to use the kthread API full kthread conversion on the voyager power switch handling thread. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>	2007-05-01 10:09:29 -05:00
Pierre Ossman	bd76631261	mmc: remove old card states Remove card states that no longer make any sense. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 16:11:57 +02:00
Philip Langdale	55556da012	MMC: Fix handling of low-voltage cards Fix handling of low voltage MMC cards. The latest MMC and SD specs both agree that support for low-voltage operations is indicated by bit 7 in the OCR. The MMC spec states that the low voltage range is 1.65-1.95V while the SD spec leaves the actual voltage range undefined - meaning that there is still no such thing as a low voltage SD card. However, an old Sandisk spec implied that bits 7.0 represented voltages below 2.0V in 1V or 0.5V increments, and the code was accordingly written with that expectation. This confusion meant that host drivers attempting to support the typical low voltage (1.8V) would set the wrong bits in the host OCR mask (usually bits 5 and/or 6) resulting in the the low voltage mode never being used. This change corrects the low voltage range and adds sanity checks on the reserved bits (0-6) and for SD cards that claim to support low-voltage operations. Signed-off-by: Philip Langdale <philipl@overt.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 14:14:50 +02:00
Philip Langdale	4be34c99a2	MMC: Consolidate voltage definitions Consolidate the list of available voltages. Up until now, a separate set of defines has been used for host->vdd than that used for the OCR voltage mask values. Having two sets of defines allows them to get out of sync and the current sets are already inconsistent with one claiming to describe ranges and the other specific voltages. Only the SDHCI driver uses the host->vdd defines and it is easily fixed to use the OCR defines. Signed-off-by: Philip Langdale <philipl@overt.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:42:28 +02:00
Pierre Ossman	7ea239d9e6	mmc: add bus handler Delegate protocol handling to "bus handlers". This allows the core to just handle the task of arbitrating the bus. Initialisation and pampering of cards is now done by the different bus handlers. This design also allows MMC and SD (and later SDIO) to be more cleanly separated, allowing easier maintenance. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:41:06 +02:00
Pierre Ossman	da7fbe58d2	mmc: Separate out protocol ops Move protocol operations and definitions into their own files in an effort to separate protocol handling and bus arbitration more clearly. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:18 +02:00
Pierre Ossman	aaac1b470b	mmc: Move core functions to subdir Create a "core" subdirectory to house the central bus handling functions. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:18 +02:00
Pierre Ossman	b855885e3b	mmc: deprecate mmc bus topology The classic MMC bus was defined as multi card bus system, which is reflected in the design in the MMC layer. When SD showed up, the bus topology was abandoned and a star topology (one card per host) was mandated. MMC version 4 has followed this, officially deprecating the bus topology. As we do not have any known users of the bus topology we can remove support for it. This will simplify the code and rectify some incorrect assumptions in the newer additions. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:18 +02:00
Pierre Ossman	3b91e5507c	mmc: Flush pending detects on host removal Make sure we kill of any pending detection runs when the host is removed instead of when it is freed. Also add some debugging to make sure the driver doesn't queue up more detection after it has removed the host. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:17 +02:00
Pierre Ossman	f74d132cec	mmc: Move OCR bit defines All host drivers were #include:ing mmc/protocol.h just to get access to the OCR bit defines. Move these to host.h instead. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:16 +02:00
Pierre Ossman	9c2c0af950	mmc: add type field to cards Split out the type of card into its own field as it hardly qualifies as a state. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:16 +02:00
Pierre Ossman	85a18ad93e	mmc: MMC sector based cards Support for MMC 4.2 sector based cards. This tweaks the init a bit and reads a new field out of the EXT_CSD. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:15 +02:00
Alex Dubov	91f8d0118a	tifm: layout fixes, small changes to comments and printfs Cosmetic changes to the code. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:15 +02:00
Alex Dubov	13cdf48ef1	tifm_sd: implement software scatter-gather It was found that delays associated with issue and completion of the commands severely limit performance of the new, fast SD cards. To alleviate this issue scatter-gather emulation in software is implemented for both dma and pio transfer modes. Non-block aligned and high memory sg entries are accounted for. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:15 +02:00
Alex Dubov	72dc9d9619	tifm_sd: replace command completion state machine with full checking State machine used to to track mmc command state was found to be fragile and unreliable, making many cards unusable. The safer solution is to perform all needed checks at every card event. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:14 +02:00
Alex Dubov	2428a8fe22	tifm: move common device management tasks from tifm_7xx1 to tifm_core Some details of the device management (create, add, remove) are really belong to the tifm_core, as they are not hardware specific. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	6113ed73e6	tifm: move common adapter management tasks from tifm_7xx1 to tifm_core Some details of the adapter management (create, add, remove) are really belong to the tifm_core, as they are not hardware specific. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	3540af8ffd	tifm: replace per-adapter kthread with freezeable workqueue Freezeable workqueue makes sure that adapter work items (device insertions and removals) would be handled after the system is fully resumed. Previously this was achieved by explicit freezing of the kthread. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	e23f2b8a1a	tifm: simplify bus match and uevent handlers Remove code duplicating the kernel functionality and clean up data structures involved in driver matching. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:13 +02:00
Alex Dubov	8dc4a61eca	tifm: use bus methods to handle probe/remove instead of driver ones. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:12 +02:00
Alex Dubov	4552f0cbd4	tifm: hide details of interrupt processing from socket drivers Instead of passing transformed value of adapter interrupt status to socket drivers, implement two separate callbacks - one for card events and another for dma events. Signed-off-by: Alex Dubov <oakad@yahoo.com> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>	2007-05-01 13:04:12 +02:00
David Teigland	72c2be776b	[DLM] interface for purge (2/2) Add code to accept purge commands from userland. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>	2007-05-01 09:11:12 +01:00
Steve Dickson	74dd34e6e8	NFS: Added support to turn off the NFSv3 READDIRPLUS RPC. READDIRPLUS can be a performance hindrance when the client is working with large directories. In addition, some servers still have bugs in their implementations (e.g. Tru64 returns wrong values for the fsid). Add a mount flag to enable users to turn it off at mount time following the implementation in Apple's NFS client. Signed-off-by: Steve Dickson <steved@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:16 -07:00
Chuck Lever	4c2eaf073f	SUNRPC: remove old portmapper net/sunrpc/pmap_clnt.c has been replaced by net/sunrpc/rpcb_clnt.c. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:15 -07:00
Chuck Lever	a509050bd3	SUNRPC: introduce rpcbind: replacement for in-kernel portmapper Introduce a replacement for the in-kernel portmapper client that supports all 3 versions of the rpcbind protocol. This code is not used yet. Original code by Groupe Bull updated for the latest kernel, with multiple bug fixes. Note that rpcb_clnt.c does not yet support registering via versions 3 and 4 of the rpcbind protocol. That is planned for a later patch. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:12 -07:00
Chuck Lever	c5a4dd8b7c	SUNRPC: Eliminate side effects from rpc_malloc Currently rpc_malloc sets req->rq_buffer internally. Make this a more generic interface: return a pointer to the new buffer (or NULL) and make the caller set req->rq_buffer and req->rq_bufsize. This looks much more like kmalloc and eliminates the side effects. To fix a potential deadlock, this patch also replaces GFP_NOFS with GFP_NOWAIT in rpc_malloc. This prevents async RPCs from sleeping outside the RPC's task scheduler while allocating their buffer. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:11 -07:00
Chuck Lever	2bea90d43a	SUNRPC: RPC buffer size estimates are too large The RPC buffer size estimation logic in net/sunrpc/clnt.c always significantly overestimates the requirements for the buffer size. A little instrumentation demonstrated that in fact rpc_malloc was never allocating the buffer from the mempool, but almost always called kmalloc. To compute the size of the RPC buffer more precisely, split p_bufsiz into two fields; one for the argument size, and one for the result size. Then, compute the sum of the exact call and reply header sizes, and split the RPC buffer precisely between the two. That should keep almost all RPC buffers within the 2KiB buffer mempool limit. And, we can finally be rid of RPC_SLACK_SPACE! Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:10 -07:00
Chuck Lever	511d2e8855	NLM: Shrink the maximum request size of NLM4 requests NLM version 4 requests estimate the call and reply header sizes rather conservatively, using the very maximum size allowed in the protocol even though Linux always uses only a small fraction of the allowable space. Reduce the size of caller and lock arguments to conserve RPC buffer space while XDR encoding NLM4 arguments. Add compile-time checks to ensure the hostname string won't overflow NLM protocol maximums. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	ca52fec152	NFS: Use pgoff_t in structures and functions that pass page cache offsets Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:09 -07:00
Trond Myklebust	8d5658c949	NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-04-30 22:17:07 -07:00

... 2 3 4 5 6 ...

13363 Commits