linux

Author	SHA1	Message	Date
Rusty Russell	bb4093deb2	lguest: restore boot speed lguest is dumb and drops all the pagetables for set_pte (which is only used for kernel mapping manipulation, so it's OK without highmem). But it's used a lot in boot, too. As a guest optimization, we suppressed this flushing until the first page switch. Now we have initial_page_table, that happens much earlier, so extend the heuristic to wait until we switch to something other than the swapper_pg_dir or initial_page_table. As measured on my laptop under kvm, this dropped the time-to-mount-root from 48 seconds to 4.3 seconds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2010-12-16 17:03:15 +10:30
Rusty Russell	bb6f1d9a99	lguest: fix crash lguest_time_init `fe25c7fc2e` "x86: lguest: Convert to new irq chip functions" converted enable_lguest_irq() to take a struct irq_data *, but didn't fix the one internal caller. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: x86@kernel.org	2010-12-16 17:03:14 +10:30
Andres Salomon	b5318d302f	x86, olpc: Speed up device tree creation during boot Calling alloc_bootmem() for tiny chunks of memory over and over is really slow; on an XO-1, it caused the time between when the kernel started booting and when the display came alive (post-lxfb probe) to increase to 44s. This patch optimizes the prom_early_alloc function by calling alloc_bootmem for 4k-sized blocks of memory, and handing out chunks of that to callers. With this patch, the time between kernel load and display initialization decreased to 23s. If there's a better way to do this early in the boot process, please let me know. (Note: increasing the chunk size to 16k didn't noticably affect boot time, and wasted 9k.) v4: clarify comment, requested by hpa v3: fix wasted memory buglet found by Milton Miller, and style fix. v2: reorder prom_early_alloc as suggested by Grant. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20101129153951.74202a84@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-15 17:11:40 -08:00
Andres Salomon	c10d1e260f	x86, olpc: Add OLPC device-tree support Make use of PROC_DEVICETREE to export the tree, and sparc's PROMTREE code to call into OLPC's Open Firmware to build the tree. v5: fix buglet with root node check (introduced in v4) v4: address some minor style issues pointed out by Grant, and explicitly cast negative phandle checks to s32. v3: rename olpc_prom to olpc_dt - rework Kconfig entries - drop devtree build hook from proc, instead adding a call to x86's paging_init (similarly to how sparc64 does it) - switch allocation from using slab to alloc_bootmem. this allows the DT to be built earlier during boot (during setup_arch); the downside is that there are some 1200 bootmem reservations that are done during boot. Not ideal.. - add a helper olpc_ofw_is_installed function to test for the existence and successful detection of OLPC's OFW. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20101116220952.26526a80@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-15 17:11:30 -08:00
Andres Salomon	4722d194e6	x86, of: Define irq functions to allow drivers/of/* to build on x86 - Define a stub irq_create_of_mapping for x86 as a stop-gap solution until drivers/of/irq is further along. - Define irq_dispose_mapping for x86 to appease of_i2c.c These are needed to allow stuff in drivers/of/ to build on x86. This stuff will eventually get replaced; quoting Grant, "The long term plan is to have the drivers/of/ code handling the mapping intelligently like powerpc currently does." But for now, just provide these functions. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20101111214526.5de7121b@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-15 17:11:16 -08:00
Randy Dunlap	52f6c5ad43	crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h Add missing header file: arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR' arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-12-15 19:44:08 +08:00
Kenji Kaneshige	7f7fbf45c6	x86: Enable the intr-remap fault handling after local APIC setup Interrupt-remapping gets enabled very early in the boot, as it determines the apic mode that the processor can use. And the current code enables the vt-d fault handling before the setup_local_APIC(). And hence the APIC LDR registers and data structure in the memory may not be initialized. So the vt-d fault handling in logical xapic/x2apic modes were broken. Fix this by enabling the vt-d fault handling in the end_local_APIC_setup() A cleaner fix of enabling fault handling while enabling intr-remapping will be addressed for v2.6.38. [ Enabling intr-remapping determines the usage of x2apic mode and the apic mode determines the fault-handling configuration. ] Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <20101201062244.541996375@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-13 16:53:32 -08:00
Kenji Kaneshige	086e8ced65	x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright <chrisw@sous-sol.org> Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-13 16:52:52 -08:00
Suresh Siddha	3fb82d56ad	x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume During suspend, we disable all the non boot cpus. And during resume we bring them all back again. So no need to do alternatives_smp_switch() in between. On my core 2 based laptop, this speeds up the suspend path by 15msec and the resume path by 5 msec (suspend/resume speed up differences can be attributed to the different P-states that the cpu is in during suspend/resume). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-13 16:23:56 -08:00
Suresh Siddha	10340ae130	x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem() Alignment of alloc_bootmem() depends on the value of L1_CACHE_SHIFT. What we need here, however, is 64 byte alignment. Use alloc_bootmem_align() and explicitly specify the alignment instead. This fixes a kernel boot crash reported by Jody when the cpu in .config is set to MPENTIUMII but the kernel is booted on a xsave-capable CPU. Reported-by: Jody Bruchon <jody@nctritech.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20101116212442.059967454@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>	2010-12-13 16:13:11 -08:00
H. Peter Anvin	de2a8cf98e	x86, gcc-4.6: Use gcc -m options when building vdso The vdso Makefile passes linker-style -m options not to the linker but to gcc. This happens to work with earlier gcc, but fails with gcc 4.6. Pass gcc-style -m options, instead. Note: all currently supported versions of gcc supports -m32, so there is no reason to conditionalize it any more. Reported-by: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-*@git.kernel.org> Cc: <stable@kernel.org>	2010-12-13 16:08:37 -08:00
Don Zickus	5f29805a4f	x86, watchdog: Compile fix when CONFIG_LOCAL_APIC not enabled When adjusting the code to handle removing the old nmi watchdog, I forgot to consider the compile case when the local apic is not enabled. This change fixes the following build error: arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’ Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <20101213153719.GD18577@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-13 18:23:23 +01:00
Thomas Gleixner	f1c18071ad	x86: HPET: Chose a paranoid safe value for the ETIME check commit `995bd3bb5` (x86: Hpet: Avoid the comparator readback penalty) chose 8 HPET cycles as a safe value for the ETIME check, as we had the confirmation that the posted write to the comparator register is delayed by two HPET clock cycles on Intel chipsets which showed readback problems. After that patch hit mainline we got reports from machines with newer AMD chipsets which seem to have an even longer delay. See http://thread.gmane.org/gmane.linux.kernel/1054283 and http://thread.gmane.org/gmane.linux.kernel/1069458 for further information. Boris tried to come up with an ACPI based selection of the minimum HPET cycles, but this failed on a couple of test machines. And of course we did not get any useful information from the hardware folks. For now our only option is to chose a paranoid high and safe value for the minimum HPET cycles used by the ETIME check. Adjust the minimum ns value for the HPET clockevent accordingly. Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6> Cc: Simon Kirby <sim@hostway.ca> Cc: Borislav Petkov <bp@alien8.de> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: John Stultz <johnstul@us.ibm.com>	2010-12-13 13:42:44 +01:00
Tadeusz Struk	3c097b8008	crypto: aesni-intel - Fixed build with binutils 2.16 This patch fixes the problem with 2.16 binutils. Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com> Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-12-13 19:51:15 +08:00
Thomas Gleixner	a8760eca6c	x86: Check tsc available/disabled in the delayed init function The delayed TSC init function does not check whether the system has no TSC or TSC is disabled at the kernel command line, which results in a crash in the work queue based extended calibration due to division by zero because the basic calibration never happened. Add the missing checks and do not touch TSC when not available or disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com>	2010-12-13 11:35:05 +01:00
Tejun Heo	0aa002fe60	x86: apic: Cleanup and simplify setup_local_APIC() setup_local_APIC() is used to setup local APIC early during CPU initialization and already assumes that preemption is disabled on entry. However, The function unnecessarily disables and enables preemption and uses smp_processor_id() multiple times in and out of the nested preemption disabled section. This gives the wrong impression that the function might be able to handle being called with preemption enabled and/or migrated to another processor in the middle. Make it clear that the function is always called with preemption disabled, drop the confusing preemption disable block and call smp_processor_id() once at the beginning of the function. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: brgerst@gmail.com LKML-Reference: <4D00B3B9.7060702@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-10 13:46:26 +01:00
Don Zickus	5dc3055879	x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctls Originally adapted from Huang Ying's patch which moved the unknown_nmi_panic to the traps.c file. Because the old nmi watchdog was deleted before this change happened, the unknown_nmi_panic sysctl was lost. This re-adds it. Also, the nmi_watchdog sysctl was re-implemented and its documentation updated accordingly. Patch-inspired-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: fweisbec@gmail.com LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-10 00:01:06 +01:00
Don Zickus	96a84c20d6	lockup detector: Compile fixes from removing the old x86 nmi watchdog My patch that removed the old x86 nmi watchdog broke other arches. This change reverts a piece of that patch and puts the change in the correct spot. Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: fweisbec@gmail.com Cc: yinghai@kernel.org LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-10 00:01:06 +01:00
Feng Tang	0e3fa13f4e	x86: Further simplify mp_irq info handling assign_to_mp_irq() is copying the struct mpc_intsrc members one by one. That's silly. Use memcpy() and let the compiler figure it out. Same for the identical function assign_to_mpc_intsrc() mp_irq_mpc_intsrc_cmp() is comparing the struct members one by one, but no caller ever checks the different return codes. Use memcmp() instead. Remove the extra printk in MP_ioapic_info() Signed-off-by: Feng Tang <feng.tang@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Alan Cox <alan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <20101208151857.212f0018@feng-i7> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:06 +01:00
Feng Tang	2d8009ba67	x86: Unify 3 similar ways of saving mp_irqs info There are 3 places defining similar functions of saving IRQ vector info into mp_irqs[] array: mmparse/acpi/mrst. Replace the redundant code by a common function in io_apic.c as it's only called when CONFIG_X86_IO_APIC=y Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20101207133204.4d913c5a@feng-i7> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:06 +01:00
Yinghai Lu	60d79fd99f	x86, ioapic: Avoid writing io_apic id if already correct For 32bit mptable path, setup_ids_from_mpc() always writes the io_apic id register, even there is no change needed. Skip the write, when readout and mptable match. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> LKML-Reference: <4CFDF785.7010401@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	0450193bff	x86, x2apic: Don't map lapic addr for preenabled x2apic systems If x2apic is preenabled and used by the kernel, we don't need to map the lapic address. That mapping will never be used. So just skip that in register_lapic_address() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF69C.9070501@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	53301f36f3	x86, sfi: Use register_lapic_address() register_lapic_address() and mp_sfi_register_lapic_address() are almost identical. Use the common function. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4CFDF693.6000908@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	326a2e6bae	x86, apic: Use register_lapic_address() in init_apic_mapping() Remove the printk as well, we don't want to print when nothing changed. We print in register_lapic_address() already. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF68A.7020902@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	f115714163	x86, apic: Remove early_init_lapic_mapping() It is almost the same as smp_register_lapic_addr(). We just need to let smp_read_mpc() call smp_register_lapic_addr() when early==1. Add the apic_printk to smp_register_lapic_address() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF681.3030509@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:04 +01:00
Yinghai Lu	c0104d38a7	x86, apic: Unify identical register_lapic_address() functions They are the same, move the common function to apic.c to allow further cleanups. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4CFDF675.4060305@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:04 +01:00
Thomas Gleixner	51ddafcbc7	Merge branch 'x86/platform' into x86/apic-cleanups Reason: apic cleanup series depends on x86/apic, x86/amd-nb and x86/platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 18:19:21 +01:00
Thomas Gleixner	d834a9dcec	Merge branch 'x86/amd-nb' into x86/apic-cleanups Reason: apic cleanup series depends on x86/apic, x86/amd-nb x86/platform Conflicts: arch/x86/include/asm/io_apic.h Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 18:17:25 +01:00
Thomas Gleixner	4720dd1b38	x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n arch/x86/kernel/apic/io_apic.c: In function 'ack_apic_level': arch/x86/kernel/apic/io_apic.c:2433: warning: unused variable 'desc' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <201010272107.o9RL7rse018212@imap1.linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 17:43:21 +01:00
Rakib Mullick	2c6cb1053a	x86: Address 'unused' warning in hw_nmi.c again arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog). Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace section again. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <AANLkTi=rcc38QzoKa6LFy4m++-p_9=Zt4_kDQE=GeKxf@mail.gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 16:06:52 +01:00
Peter Zijlstra	c079c791c5	perf, amd: Remove the nb lock Since all the hotplug stuff is serialized by the hotplug mutex, do away with the amd_nb_lock. Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-08 20:16:30 +01:00
Andre Przywara	73c1160ce3	KVM: enlarge number of possible CPUID leaves Currently the number of CPUID leaves KVM handles is limited to 40. My desktop machine (AthlonII) already has 35 and future CPUs will expand this well beyond the limit. Extend the limit to 80 to make room for future processors. KVM-Stable-Tag. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:38 +02:00
Joerg Roedel	24d1b15f72	KVM: SVM: Do not report xsave in supported cpuid To support xsave properly for the guest the SVM module need software support for it. As long as this is not present do not report the xsave as supported feature in cpuid. As a side-effect this patch moves the bit() helper function into the x86.h file so that it can be used in svm.c too. KVM-Stable-Tag. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:37 +02:00
Sheng Yang	3ea3aa8cf6	KVM: Fix OSXSAVE after migration CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID after migration. KVM-Stable-Tag. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:31 +02:00
Linus Torvalds	6313e3c217	Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/pvclock: Zero last_value on resume * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf record: Fix eternal wait for stillborn child perf header: Don't assume there's no attr info if no sample ids is provided perf symbols: Figure out start address of kernel map from kallsyms perf symbols: Fix kallsyms kernel/module map splitting * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: Fix printk_needs_cpu() return value on offline cpus printk: Fix wake_up_klogd() vs cpu hotplug	2010-12-08 06:40:59 -08:00
Ingo Molnar	10a18d7dc0	Merge commit 'v2.6.37-rc5' into perf/core Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-07 07:49:51 +01:00
Feng Tang	991cfffa7c	x86, earlyprintk: Move mrst early console to platform/ and fix a typo Move the code to arch/x86/platform/mrst/. Also fix a typo to use the correct config option: ONFIG_EARLY_PRINTK_MRST Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: alan@linux.intel.com LKML-Reference: <1291348298-21263-1-git-send-email-feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 20:52:04 +01:00
Masami Hiramatsu	f984ba4eb5	kprobes: Use text_poke_smp_batch for unoptimizing Use text_poke_smp_batch() on unoptimization path for reducing the number of stop_machine() issues. If the number of unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:32 +01:00
Masami Hiramatsu	cd7ebe2298	kprobes: Use text_poke_smp_batch for optimizing Use text_poke_smp_batch() in optimization path for reducing the number of stop_machine() issues. If the number of optimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Changes in v5: - Use kick_kprobe_optimizer() instead of directly calling schedule_delayed_work(). - Rescheduling optimizer outside of kprobe mutex lock. Changes in v2: - Allocate code buffer and parameters in arch_init_kprobes() instead of using static arraies. - Merge previous max optimization limit patch into this patch. So, this patch introduces upper limit of optimization at once. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:31 +01:00
Masami Hiramatsu	7deb18dcf0	x86: Introduce text_poke_smp_batch() for batch-code modifying Introduce text_poke_smp_batch(). This function modifies several text areas with one stop_machine() on SMP. Because calling stop_machine() is heavy task, it is better to aggregate text_poke requests. ( Note: I've talked with Rusty about this interface, and he would not like to expand stop_machine() interface, since it is not for generic use. ) Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:31 +01:00
Masami Hiramatsu	6274de4984	kprobes: Support delayed unoptimizing Unoptimization occurs when a probe is unregistered or disabled, and is heavy because it recovers instructions by using stop_machine(). This patch delays unoptimization operations and unoptimize several probes at once by using text_poke_smp_batch(). This can avoid unexpected system slowdown coming from stop_machine(). Changes in v5: - Split this patch into several cleanup patches and this patch. - Fix some text_mutex lock miss. - Use bool instead of int for behavior flags. - Add additional comment for (un)optimizing path. Changes in v2: - Use dynamic allocated buffers and params. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:30 +01:00
Feng Tang	e4d2ebcab1	x86, apbt: Setup affinity for apb timers acting as per-cpu timer Commit `a5ef2e70` "x86: Sanitize apb timer interrupt handling" forgot the affinity setup when cleaning up the code, this patch just adds the forgotten part Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Jacob Pan <jacob.jun.pan@intel.com> Cc: Alan Cox <alan@linux.intel.com> LKML-Reference: <1291348298-21263-2-git-send-email-feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 15:58:26 +01:00
Dirk Brandewie	5ec6960f6f	ce4100: Add errata fixes for UART on CE4100 This patch enables the UART on the CE4100. The UART has a couple of issues that need to be worked around. First the UART is mostly PC compatible except that it is clocked eight times faster than a standard PC so the default configuration provided in arch/x86/include/asm/serial.h needs to be overridden. Second the TX interrupt may not be set correctly all the time. Lastly accessing the UART via I/O space for early_prink() hangs the chip when the IOAPIC is enabled. A custom mem_serial_in() is provided to work around the TX interrupt issue. The configuration issues are dealt with in the call back registered with the 8250 driver via serial8250_set_isa_configurator() Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> LKML-Reference: <1290436128-17958-1-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 15:58:26 +01:00
Sebastian Andrzej Siewior	a38c5380ef	x86: io_apic: Split setup_ioapic_ids_from_mpc() Sodaville needs to setup the IO_APIC ids as the boot loader leaves them uninitialized. Split out the setter function so it can be called unconditionally from the sodaville board code. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20101126165020.GA26361@www.tglx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 14:30:28 +01:00
Linus Torvalds	11e8896474	Merge branch '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: unplug the emulated devices at resume time xen: fix save/restore for PV on HVM guests with pirq remapping xen: resume the pv console for hvm guests too xen: fix MSI setup and teardown for PV on HVM guests xen: use PHYSDEVOP_get_free_pirq to implement find_unbound_pirq	2010-12-03 11:30:57 -08:00
Linus Torvalds	8338fded13	Merge branches 'upstream/core' and 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: allocate irq descs on any NUMA node xen: prevent crashes with non-HIGHMEM 32-bit kernels with largeish memory xen: use default_idle xen: clean up "extra" memory handling some more * 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: x86/32: perform initial startup on initial_page_table xen: don't bother to stop other cpus on shutdown/reboot	2010-12-03 10:08:52 -08:00
John Stultz	08ec0c58fb	x86: Improve TSC calibration using a delayed workqueue Boot to boot the TSC calibration may vary by quite a large amount. While normal variance of 50-100ppm can easily be seen, the quick calibration code only requires 500ppm accuracy, which is the limit of what NTP can correct for. This can cause problems for systems being used as NTP servers, as every time they reboot it can take hours for them to calculate the new drift error caused by the calibration. The classic trade-off here is calibration accuracy vs slow boot times, as during the calibration nothing else can run. This patch uses a delayed workqueue to calibrate the TSC over the period of a second. This allows very accurate calibration (in my tests only varying by 1khz or 0.4ppm boot to boot). Additionally this refined calibration step does not block the boot process, and only delays the TSC clocksoure registration by a few seconds in early boot. If the refined calibration strays 1% from the early boot calibration value, the system will fall back to already calculated early boot calibration. Credit to Andi Kleen who suggested using a timer quite awhile back, but I dismissed it thinking the timer calibration would be done after the clocksource was registered (which would break things). Forgive me for my short-sightedness. This patch has worked very well in my testing, but TSC hardware is quite varied so it would probably be good to get some extended testing, possibly pushing inclusion out to 2.6.39. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@elte.hu> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Clark Williams <williams@redhat.com> CC: Andi Kleen <andi@firstfloor.org>	2010-12-02 16:48:37 -08:00
John Stultz	b0f969009f	Merge remote branch 'tip/x86/tsc' into fortglx/2.6.38/tip/x86/tsc Conflicts: Documentation/kernel-parameters.txt	2010-12-02 16:47:52 -08:00
Jeremy Fitzhardinge	64141da587	vmalloc: eagerly clear ptes on vunmap On stock 2.6.37-rc4, running: # mount lilith:/export /mnt/lilith # find /mnt/lilith/ -type f -print0 \| xargs -0 file crashes the machine fairly quickly under Xen. Often it results in oops messages, but the couple of times I tried just now, it just hung quietly and made Xen print some rude messages: (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 1d7058 (pfn 18fa7) (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp 1000000000000000) for mfn 1d2e04 (pfn 1d1fb) (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04 Which means the domain tried to map a pagetable page RW, which would allow it to map arbitrary memory, so Xen stopped it. This is because vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had finished with them, and those pages got recycled as pagetable pages while still having these RW aliases. Removing those mappings immediately removes the Xen-visible aliases, and so it has no problem with those pages being reused as pagetable pages. Deferring the TLB flush doesn't upset Xen because it can flush the TLB itself as needed to maintain its invariants. When unmapping a region in the vmalloc space, clear the ptes immediately. There's no point in deferring this because there's no amortization benefit. The TLBs are left dirty, and they are flushed lazily to amortize the cost of the IPIs. This specific motivation for this patch is an oops-causing regression since 2.6.36 when using NFS under Xen, triggered by the NFS client's use of vm_map_ram() introduced in `56e4ebf877` ("NFS: readdir with vmapped pages") . XFS also uses vm_map_ram() and could cause similar problems. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-02 14:51:15 -08:00
Stefano Stabellini	512b109ec9	xen: unplug the emulated devices at resume time Early after being resumed we need to unplug again the emulated devices. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-12-02 14:40:53 +00:00
Stefano Stabellini	af42b8d12f	xen: fix MSI setup and teardown for PV on HVM guests When remapping MSIs into pirqs for PV on HVM guests, qemu is responsible for doing the actual mapping and unmapping. We only give qemu the desired pirq number when we ask to do the mapping the first time, after that we should be reading back the pirq number from qemu every time we want to re-enable the MSI. This fixes a bug in xen_hvm_setup_msi_irqs that manifests itself when trying to enable the same MSI for the second time: the old MSI to pirq mapping is still valid at this point but xen_hvm_setup_msi_irqs would try to assign a new pirq anyway. A simple way to reproduce this bug is to assign an MSI capable network card to a PV on HVM guest, if the user brings down the corresponding ethernet interface and up again, Linux would fail to enable MSIs on the device. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-12-02 14:34:25 +00:00
Ian Campbell	805e3f4950	xen: x86/32: perform initial startup on initial_page_table Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-29 17:07:53 -08:00
Jeremy Fitzhardinge	31e323cca9	xen: don't bother to stop other cpus on shutdown/reboot Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's no need to do it manually. In any case it will fail because all the IPI irqs have been pulled down by this point, so the cross-CPU calls will simply hang forever. Until change `76fac077db` the function calls were not synchronously waited for, so this wasn't apparent. However after that change the calls became synchronous leading to a hang on shutdown on multi-VCPU guests. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: Alok Kataria <akataria@vmware.com>	2010-11-29 14:16:53 -08:00
Mathias Krause	559ad0ff13	crypto: aesni-intel - Fixed build error on x86-32 Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers not available on x86-32. While at it, fixed unregister order in aesni_exit(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-11-29 08:35:39 +08:00
Linus Torvalds	a9e40a2493	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix the software context switch counter perf, x86: Fixup Kconfig deps x86, perf, nmi: Disable perf if counters are not accessible perf: Fix inherit vs. context rotation bug	2010-11-28 12:25:02 -08:00
Jeremy Fitzhardinge	e7a3481c02	x86/pvclock: Zero last_value on resume If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-28 09:33:20 +01:00
Mathias Krause	0d258efb6a	crypto: aesni-intel - Ported implementation to x86-32 The AES-NI instructions are also available in legacy mode so the 32-bit architecture may profit from those, too. To illustrate the performance gain here's a short summary of a dm-crypt speed test on a Core i7 M620 running at 2.67GHz comparing both assembler implementations: x86: i568 aes-ni delta ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4% CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3% LRW, 256 bit: 108.6 MB/s 222.1 MB/s +104.5% XTS, 256 bit: 105.0 MB/s 205.5 MB/s +95.7% Additionally, due to some minor optimizations, the 64-bit version also got a minor performance gain as seen below: x86-64: old impl. new impl. delta ECB, 256 bit: 121.1 MB/s 123.0 MB/s +1.5% CBC, 256 bit: 285.3 MB/s 290.8 MB/s +1.9% LRW, 256 bit: 263.7 MB/s 265.3 MB/s +0.6% XTS, 256 bit: 251.1 MB/s 255.3 MB/s +1.7% Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-11-27 16:34:46 +08:00
Linus Torvalds	fbe6c4047f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: dmar, x86: Use function stubs when CONFIG_INTR_REMAP is disabled x86-64: Fix and clean up AMD Fam10 MMCONF enabling x86: UV: Address interrupt/IO port operation conflict x86: Use online node real index in calulate_tbl_offset() x86, asm: Fix binutils 2.15 build failure	2010-11-27 07:28:47 +09:00
Linus Torvalds	d2f30c73ab	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf symbols: Remove incorrect open-coded container_of() perf record: Handle restrictive permissions in /proc/{kallsyms,modules} x86/kprobes: Prevent kprobes to probe on save_args() irq_work: Drop cmpxchg() result perf: Fix owner-list vs exit x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG tracing: Fix recursive user stack trace perf,hw_breakpoint: Initialize hardware api earlier x86: Ignore trap bits on single step exceptions tracing: Force arch_local_irq_* notrace for paravirt tracing: Fix module use of trace_bprintk()	2010-11-27 07:28:17 +09:00
Cyrill Gorcunov	af86da5318	perf, x86: P4 PMU - describe config format Add description of .config in a sake of RAW events. At least this should bring some light to those who will be reading this code. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:57 +01:00
Peter Zijlstra	004417a6d4	perf, arch: Cleanup perf-pmu init vs lockup-detector The perf hardware pmu got initialized at various points in the boot, some before early_initcall() some after (notably arch_initcall). The problem is that the NMI lockup detector is ran from early_initcall() and expects the hardware pmu to be present. Sanitize this by moving all architecture hardware pmu implementations to initialize at early_initcall() and move the lockup detector to an explicit initcall right after that. Cc: paulus <paulus@samba.org> Cc: davem <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290707759.2145.119.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:56 +01:00
Andi Kleen	5ef428c4b5	x86: Set cpu masks before calling CPU_STARTING notifiers When booting up a CPU set the various topology masks before calling the CPU_STARTING notifier. This way the notifier can actually use the masks. This is needed for a perf change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290077254-12165-2-git-send-email-andi@firstfloor.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:56 +01:00
Franck Bui-Huu	6c7e550f13	perf: Introduce is_sampling_event() and use it when appropriate. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:54 +01:00
Ingo Molnar	6c869e772c	Merge branch 'perf/urgent' into perf/core Conflicts: arch/x86/kernel/apic/hw_nmi.c Merge reason: Resolve conflict, queue up dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:07:02 +01:00
Ingo Molnar	e4e91ac410	Merge commit 'v2.6.37-rc3' into perf/core Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:04:47 +01:00
Peter Zijlstra	cc2067a514	perf, x86: Fixup Kconfig deps This leads to a Kconfig dep inversion, x86 selects PERF_EVENT (due to a hw_breakpoint dep) but doesn't unconditionally provide HAVE_PERF_EVENT. (This can cause build failures on M386/M486 kernel .config's.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222055.982965150@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:00:58 +01:00
Don Zickus	33c6d6a7ad	x86, perf, nmi: Disable perf if counters are not accessible In a kvm virt guests, the perf counters are not emulated. Instead they return zero on a rdmsrl. The perf nmi handler uses the fact that crossing a zero means the counter overflowed (for those counters that do not have specific interrupt bits). Therefore on kvm guests, perf will swallow all NMIs thinking the counters overflowed. This causes problems for subsystems like kgdb which needs NMIs to do its magic. This problem was discovered by running kgdb tests. The solution is to write garbage into a perf counter during the initialization and hopefully reading back the same number. On kvm guests, the value will be read back as zero and we disable perf as a result. Reported-by: Jason Wessel <jason.wessel@windriver.com> Patch-inspired-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1290462923-30734-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:00:57 +01:00
Linus Torvalds	8a3fbc9fdb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: remove duplicated #include xen: x86/32: perform initial startup on initial_page_table	2010-11-25 08:35:53 +09:00
Andrew Morton	91d95fda85	arch/x86/include/asm/fixmap.h: mark __set_fixmap_offset as __always_inline When compiling arch/x86/kernel/early_printk_mrst.c with i386 allmodconfig, gcc-4.1.0 generates an out-of-line copy of __set_fixmap_offset() which contains a reference to __this_fixmap_does_not_exist which the compiler cannot elide. Marking __set_fixmap_offset() as __always_inline prevents this. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Feng Tang <feng.tang@intel.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-25 06:50:49 +09:00
Huang Weiyi	e6d4a76dbf	xen: remove duplicated #include Remove duplicated #include('s) in arch/x86/xen/setup.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-11-24 12:07:45 -05:00
Ian Campbell	5b5c1af104	xen: x86/32: perform initial startup on initial_page_table Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-11-24 12:07:44 -05:00
Linus Torvalds	a4ec046c98	Merge branch 'upstream/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (23 commits) xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs. xen: set IO permission early (before early_cpu_init()) xen: re-enable boot-time ballooning xen/balloon: make sure we only include remaining extra ram xen/balloon: the balloon_lock is useless xen: add extra pages to balloon xen: make evtchn's name less generic xen/evtchn: the evtchn device is non-seekable Revert "xen/privcmd: create address space to allow writable mmaps" xen/events: use locked set\|clear_bit() for cpu_evtchn_mask xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore xen/xenfs: update xenfs_mount for new prototype xen: fix header export to userspace xen: implement XENMEM_machphys_mapping xen: set vma flag VM_PFNMAP in the privcmd mmap file_op xen: xenfs: privcmd: check put_user() return code xen/evtchn: add missing static xen/evtchn: Fix name of Xen event-channel device xen/evtchn: don't do unbind_from_irqhandler under spinlock xen/evtchn: remove spurious barrier ...	2010-11-24 08:23:18 +09:00
Jeremy Fitzhardinge	bc15fde77f	xen: use default_idle We just need the idle loop to drop into safe_halt, which default_idle() is perfectly capable of doing. There's no need to duplicate it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-22 17:19:34 -08:00
Jeremy Fitzhardinge	c2d0879112	xen: clean up "extra" memory handling some more Make sure that extra_pages is added for all E820_RAM regions beyond mem_end - completely excluded regions as well as the remains of partially included regions. Also makes sure the extra region is not unnecessarily high, and simplifies the logic to decide which regions should be added. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-22 16:34:28 -08:00
Jeremy Fitzhardinge	9b8321531a	Merge branches 'upstream/core', 'upstream/xenfs' and 'upstream/evtchn' into upstream/for-linus * upstream/core: xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs. xen: set IO permission early (before early_cpu_init()) xen: re-enable boot-time ballooning xen/balloon: make sure we only include remaining extra ram xen/balloon: the balloon_lock is useless xen: add extra pages to balloon xen/events: use locked set\|clear_bit() for cpu_evtchn_mask xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore xen: implement XENMEM_machphys_mapping * upstream/xenfs: Revert "xen/privcmd: create address space to allow writable mmaps" xen/xenfs: update xenfs_mount for new prototype xen: fix header export to userspace xen: set vma flag VM_PFNMAP in the privcmd mmap file_op xen: xenfs: privcmd: check put_user() return code * upstream/evtchn: xen: make evtchn's name less generic xen/evtchn: the evtchn device is non-seekable xen/evtchn: add missing static xen/evtchn: Fix name of Xen event-channel device xen/evtchn: don't do unbind_from_irqhandler under spinlock xen/evtchn: remove spurious barrier xen/evtchn: ports start enabled xen/evtchn: dynamically allocate port_user array xen/evtchn: track enabled state for each port	2010-11-22 12:22:42 -08:00
Konrad Rzeszutek Wilk	ec35a69c46	xen: set IO permission early (before early_cpu_init()) This patch is based off "xen dom0: Set up basic IO permissions for dom0." by Juan Quintela <quintela@redhat.com>. On AMD machines when we boot the kernel as Domain 0 we get this nasty: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1-101116 x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff8130271b>] (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest (XEN) rax: 000000008000c068 rbx: ffffffff8186c680 rcx: 0000000000000068 (XEN) rdx: 0000000000000cf8 rsi: 000000000000c000 rdi: 0000000000000000 (XEN) rbp: ffffffff81801e98 rsp: ffffffff81801e50 r8: ffffffff81801eac (XEN) r9: ffffffff81801ea8 r10: ffffffff81801eb4 r11: 00000000ffffffff (XEN) r12: ffffffff8186c694 r13: ffffffff81801f90 r14: ffffffffffffffff (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000221803000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81801e50: RIP points to read_pci_config() function. The issue is that we don't set IO permissions for the Linux kernel early enough. The call sequence used to be: xen_start_kernel() x86_init.oem.arch_setup = xen_setup_arch; setup_arch: - early_cpu_init - early_init_amd - read_pci_config - x86_init.oem.arch_setup [ xen_arch_setup ] - set IO permissions. We need to set the IO permissions earlier on, which this patch does. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-11-22 12:10:31 -08:00
Lin Ming	691513f70d	x86: Resume trampoline must be executable commit 5bd5a452(x86: Add NX protection for kernel data) marked the trampoline area NX - which unsurprisingly breaks resume and cpu hotplug. Revert the portion of that commit, which touches the trampoline. Originally-from: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com> Cc: Matthieu Castet <castet.matthieu@free.fr> Cc: Siarhei Liakh <sliakh.lkml@gmail.com> Cc: Xuxian Jiang <jiang@cs.ncsu.edu> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Tested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-22 14:38:52 +01:00
Thomas Gleixner	9cdca86972	x86: platform: Move iris to x86/platform where it belongs Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-20 10:37:05 +01:00
Jeremy Fitzhardinge	d2a817130c	xen: re-enable boot-time ballooning Now that the balloon driver doesn't stumble over non-RAM pages, we can enable the extra space for ballooning. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-19 23:28:08 -08:00
Vasiliy Kulikov	5ca9afdb9f	x86, mrst: Check platform_device_register() return code platform_device_register() may fail, if so propagate the return code from mrst_device_create(). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> LKML-Reference: <1290104207-31279-1-git-send-email-segoon@openwall.com> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-18 13:45:46 -08:00
Ingo Molnar	ae51ce9061	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-11-18 20:07:12 +01:00
Linus Torvalds	fb3ff69d13	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Fix host userspace gsbase corruption KVM: Correct ordering of ldt reload wrt fs/gs reload	2010-11-18 09:45:47 -08:00
Linus Torvalds	2d42dc3feb	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb,ppc: Fix regression in evr register handling kgdb,x86: fix regression in detach handling kdb: fix crash when KDB_BASE_CMD_MAX is exceeded kdb: fix memory leak in kdb_main.c	2010-11-18 08:24:58 -08:00
Hans Rosenfeld	f658bcfb26	x86, cacheinfo: Cleanup L3 cache index disable support Adaptions to the changes of the AMD northbridge caching code: instead of a bool in each l3 struct, use a flag in amd_northbridges.flags to indicate L3 cache index disable support; use a pointer to the whole northbridge instead of the misc device in the l3 struct; simplify the initialisation; dynamically generate sysfs attribute array. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-18 15:53:06 +01:00
Hans Rosenfeld	9653a5c76c	x86, amd-nb: Cleanup AMD northbridge caching code Support more than just the "Misc Control" part of the northbridges. Support more flags by turning "gart_supported" into a single bit flag that is stored in a flags member. Clean up related code by using a set of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb()) instead of accessing the NB data structures directly. Reorder the initialization code and put the GART flush words caching in a separate function. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-18 15:53:05 +01:00
Hans Rosenfeld	eec1d4fa00	x86, amd-nb: Complete the rename of AMD NB and related code Not only the naming of the files was confusing, it was even more so for the function and variable names. Renamed the K8 NB and NUMA stuff that is also used on other AMD platforms. This also renames the CONFIG_K8_NUMA option to CONFIG_AMD_NUMA and the related file k8topology_64.c to amdtopology_64.c. No functional changes intended. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-18 15:53:04 +01:00
Soeren Sandmann Pedersen	9c0729dc80	x86: Eliminate bp argument from the stack tracing routines The various stack tracing routines take a 'bp' argument in which the caller is supposed to provide the base pointer to use, or 0 if doesn't have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not defined, this means all callers in principle should either always pass 0, or be conditional on CONFIG_FRAME_POINTER. However, there are only really three use cases for stack tracing: (a) Trace the current task, including IRQ stack if any (b) Trace the current task, but skip IRQ stack (c) Trace some other task In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just be 0. If it _is_ defined, then - in case (a) bp should be gotten directly from the CPU's register, so the caller should pass NULL for regs, - in case (b) the caller should should pass the IRQ registers to dump_trace(), - in case (c) bp should be gotten from the top of the task's stack, so the caller should pass NULL for regs. Hence, the bp argument is not necessary because the combination of task and regs is sufficient to determine an appropriate value for bp. This patch introduces a new inline function stack_frame(task, regs) that computes the desired bp. This function is then called from the two versions of dump_stack(). Signed-off-by: Soren Sandmann <ssp@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org>, Cc: Frederic Weisbecker <fweisbec@gmail.com>, Cc: Arnaldo Carvalho de Melo <acme@redhat.com>, LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-11-18 14:37:34 +01:00
Jan Beulich	37db6c8f1d	x86-64: Fix and clean up AMD Fam10 MMCONF enabling Candidate memory ranges were not calculated properly (start addresses got needlessly rounded down, and end addresses didn't get rounded up at all), address comparison for secondary CPUs was done on only part of the address, and disabled status wasn't tracked properly. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <4CE24DF40200007800022737@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:41:35 +01:00
Masami Hiramatsu	de31ec8a31	x86/kprobes: Prevent kprobes to probe on save_args() Prevent kprobes to probe on save_args() since this function will be called from breakpoint exception handler. That will cause infinit loop on breakpoint handling. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20101118101655.2779.2816.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:40:19 +01:00
matthieu castet	84e1c6bb38	x86: Add RO/NX protection for loadable kernel modules This patch is a logical extension of the protection provided by CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting module_core and module_init into three logical parts each and setting appropriate page access permissions for each individual section: 1. Code: RO+X 2. RO data: RO+NX 3. RW data: RW+NX In order to achieve proper protection, layout_sections() have been modified to align each of the three parts mentioned above onto page boundary. Next, the corresponding page access permissions are set right before successful exit from load_module(). Further, free_module() and sys_init_module have been modified to set module_core and module_init as RW+NX right before calling module_free(). By default, the original section layout and access flags are preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y, the patch will page-align each group of sections to ensure that each page contains only one type of content and will enforce RO/NX for each group of pages. -v1: Initial proof-of-concept patch. -v2: The patch have been re-written to reduce the number of #ifdefs and to make it architecture-agnostic. Code formatting has also been corrected. -v3: Opportunistic RO/NX protection is now unconditional. Section page-alignment is enabled when CONFIG_DEBUG_RODATA=y. -v4: Removed most macros and improved coding style. -v5: Changed page-alignment and RO/NX section size calculation -v6: Fixed comments. Restricted RO/NX enforcement to x86 only -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added calls to set_all_modules_text_rw() and set_all_modules_text_ro() in ftrace -v8: updated for compatibility with linux 2.6.33-rc5 -v9: coding style fixes -v10: more coding style fixes -v11: minor adjustments for -tip -v12: minor adjustments for v2.6.35-rc2-tip -v13: minor adjustments for v2.6.37-rc1-tip Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F914.9070106@free.fr> [ minor cleanliness edits, -v14: build failure fix ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:32:56 +01:00
Matthieu Castet	5bd5a45266	x86: Add NX protection for kernel data This patch expands functionality of CONFIG_DEBUG_RODATA to set main (static) kernel data area as NX. The following steps are taken to achieve this: 1. Linker script is adjusted so .text always starts and ends on a page bound 2. Linker script is adjusted so .rodata always start and end on a page boundary 3. NX is set for all pages from _etext through _end in mark_rodata_ro. 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c 5. bios rom is set to x when pcibios is used. The results of patch application may be observed in the diff of kernel page table dumps: pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc00a0000 640K RW GLB NX pte +0xc00a0000-0xc0100000 384K RW GLB x pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte No pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc0100000 1M RW GLB NX pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte The patch has been originally developed for Linux 2.6.34-rc2 x86 by Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>. -v1: initial patch for 2.6.30 -v2: patch for 2.6.31-rc7 -v3: moved all code into arch/x86, adjusted credits -v4: fixed ifdef, removed credits from CREDITS -v5: fixed an address calculation bug in mark_nxdata_nx() -v6: added acked-by and PT dump diff to commit log -v7: minor adjustments for -tip -v8: rework with the merge of "Set first MB as RW+NX" Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F82E.60601@free.fr> [ minor cleanliness edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 12:52:04 +01:00
matthieu castet	64edc8ed5f	x86: Fix improper large page preservation This patch fixes a bug in try_preserve_large_page() which may result in improper large page preservation and improper application of page attributes to the memory area outside of the original change request. More specifically, the problem manifests itself when set_memory_*() is called for several pages at the beginning of the large page and try_preserve_large_page() erroneously concludes that the change can be applied to whole large page. The fix consists of 3 parts: 1. Addition of "required" protection attributes in static_protections(), so .data and .bss can be guaranteed to stay "RW" 2. static_protections() is now called for every small page within large page to determine compatibility of new protection attributes (instead of just small pages within the requested range). 3. Large page can be preserved only if attribute change is large-page-aligned and covers whole large page. -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2 -v2: Replaced pfn check with address check for kernel rw-data Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F7F3.8030809@free.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 12:52:04 +01:00
Dimitri Sivanich	8191c9f692	x86: UV: Address interrupt/IO port operation conflict This patch for SGI UV systems addresses a problem whereby interrupt transactions being looped back from a local IOH, through the hub to a local CPU can (erroneously) conflict with IO port operations and other transactions. To workaound this we set a high bit in the APIC IDs used for interrupts. This bit appears to be ignored by the sockets, but it avoids the conflict in the hub. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> LKML-Reference: <20101116222352.GA8155@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> ___ arch/x86/include/asm/uv/uv_hub.h \| 4 ++++ arch/x86/include/asm/uv/uv_mmrs.h \| 19 ++++++++++++++++++- arch/x86/kernel/apic/x2apic_uv_x.c \| 25 +++++++++++++++++++++++-- arch/x86/platform/uv/tlb_uv.c \| 2 +- arch/x86/platform/uv/uv_time.c \| 4 +++- 5 files changed, 49 insertions(+), 5 deletions(-)	2010-11-18 10:41:25 +01:00
Ingo Molnar	fcf48a725a	Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent	2010-11-18 10:37:51 +01:00
Yinghai Lu	9223081f54	x86: Use online node real index in calulate_tbl_offset() Found a NUMA system that doesn't have RAM installed at the first socket which hangs while executing init scripts. bisected it to: \| commit `9329672021` \| Author: Shaohua Li <shaohua.li@intel.com> \| Date: Wed Oct 20 11:07:03 2010 +0800 \| \| x86: Spread tlb flush vector between nodes It turns out when first socket is not online it could have cpus on node1 tlb_offset set to bigger than NUM_INVALIDATE_TLB_VECTORS. That could affect systems like 4 sockets, but socket 2 doesn't have installed, sockets 3 will get too big tlb_offset. Need to use real online node idx. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Shaohua Li <shaohua.li@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CDEDE59.40603@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 10:10:50 +01:00
Shérab	82148d1d0b	x86/platform: Add Eurobraille/Iris power off support The Iris machines from Eurobraille do not have APM or ACPI support to shut themselves down properly. A special I/O sequence is needed to do so. This modle runs this I/O sequence at kernel shutdown when its force parameter is set to 1. Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> [ did minor coding style edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 10:03:24 +01:00
Kees Cook	79250af2d5	x86: Fix included-by file reference comments Adjust the paths for files that are including verify_cpu.S. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> LKML-Reference: <1289931004-16066-1-git-send-email-kees.cook@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:58:54 +01:00
Tetsuo Handa	96e612ffc3	x86, asm: Fix binutils 2.15 build failure Add parentheses around one pushl_cfi argument. Commit `df5d1874` "x86: Use {push,pop}{l,q}_cfi in more places" caused GNU assembler 2.15 (Debian Sarge) to fail. It is still failing as of commit `07bd8516` "x86, asm: Restore parentheses around one pushl_cfi argument". This patch solves build failure with GNU assembler 2.15. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Jan Beulich <jbeulich@novell.com> Cc: heukelum@fastmail.fm Cc: hpa@linux.intel.com LKML-Reference: <201011160445.oAG4jGif079860@www262.sakura.ne.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:25:11 +01:00
Rakib Mullick	0e2af2a9ab	x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG backtrace_mask has been used under the code context of ARCH_HAS_NMI_WATCHDOG. So put it into that context. We were warned by the following warning: arch/x86/kernel/apic/hw_nmi.c:21: warning: ‘backtrace_mask’ defined but not used Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Don Zickus <dzickus@redhat.com> LKML-Reference: <1289573455-3410-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:15:12 +01:00
Don Zickus	072b198a4a	x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog Now that the bulk of the old nmi_watchdog is gone, remove all the stub variables and hooks associated with it. This touches lots of files mainly because of how the io_apic nmi_watchdog was implemented. Now that the io_apic nmi_watchdog is forever gone, remove all its fingers. Most of this code was not being exercised by virtue of nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to risky here. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:08:23 +01:00
Don Zickus	5f2b0ba4d9	x86, nmi_watchdog: Remove the old nmi_watchdog Now that we have a new nmi_watchdog that is more generic and sits on top of the perf subsystem, we really do not need the old nmi_watchdog any more. In addition, the old nmi_watchdog doesn't really work if you are using the default clocksource, hpet. The old nmi_watchdog code relied on local apic interrupts to determine if the cpu is still alive. With hpet as the clocksource, these interrupts don't increment any more and the old nmi_watchdog triggers false postives. This piece removes the old nmi_watchdog code and stubs out any variables and functions calls. The stubs are the same ones used by the new nmi_watchdog code, so it should be well tested. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:08:23 +01:00
Ingo Molnar	a89d4bd055	Merge branch 'tip/perf/urgent-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent	2010-11-18 08:07:36 +01:00
Avi Kivity	c8770e7ba6	KVM: VMX: Fix host userspace gsbase corruption We now use load_gs_index() to load gs safely; unfortunately this also changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted in confusion and breakage running 32-bit host userspace on a 64-bit kernel. Fix by - saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs - doing the host save/load unconditionally, instead of only when in guest long mode Things can be cleaned up further, but this is the minmal fix for now. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:48:05 -02:00
Avi Kivity	0a77fe4c18	KVM: Correct ordering of ldt reload wrt fs/gs reload If fs or gs refer to the ldt, they must be reloaded after the ldt. Reorder the code to that effect. Userspace code that uses the ldt with kvm is nonexistent, so this doesn't fix a user-visible bug. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:47:59 -02:00
Jason Wessel	10a6e67648	kgdb,x86: fix regression in detach handling The fix from `ba773f7c51` (x86,kgdb: Fix hw breakpoint regression) was not entirely complete. The kgdb_remove_all_hw_break() function also needs to call the hw_break_release_slot() or else a breakpoint can get activated again after the debugger has detached. The kgdb test suite exposes the behavior in the form of either a hang or repetitive failure. The kernel config that exposes the problem contains all of the following: CONFIG_DEBUG_RODATA=y CONFIG_KGDB_TESTS=y CONFIG_KGDB_TESTS_ON_BOOT=y CONFIG_KGDB_TESTS_BOOT_STRING="V1F100" Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Tested-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-11-17 13:54:57 -06:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Randy Dunlap	ad02519a0d	x86, mrst: Fix dependencies of "select INTEL_SCU_IPC" commit `b9fc71f47` (x86, mrst: The shutdown for MRST requires the SCU IPC mechanism) introduced the following warning: warning: (X86_MRST && PCI && PCI_GOANY && X86_32 && X86_EXTENDED_PLATFORM && X86_IO_APIC) selects INTEL_SCU_IPC which has unmet direct dependencies (X86 && X86_PLATFORM_DEVICES && X86_MRST) which is due to the hierarchical menu structure. Select X86_PLATFORM_DEVICES as well. Originally-from: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20101115101406.77e072ef.randy.dunlap@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>	2010-11-17 13:41:44 +01:00
Alan Cox	b9fc71f47d	x86, mrst: The shutdown for MRST requires the SCU IPC mechanism Fix the build failure reported by Randy. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101115173110.6877.83958.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-17 13:27:44 +01:00
Jeremy Fitzhardinge	20b4755e4f	Merge commit 'v2.6.37-rc2' into upstream/xenfs * commit 'v2.6.37-rc2': (10093 commits) Linux 2.6.37-rc2 capabilities/syslog: open code cap_syslog logic to fix build failure i2c: Sanity checks on adapter registration i2c: Mark i2c_adapter.id as deprecated i2c: Drivers shouldn't include <linux/i2c-id.h> i2c: Delete unused adapter IDs i2c: Remove obsolete cleanup for clientdata include/linux/kernel.h: Move logging bits to include/linux/printk.h Fix gcc 4.5.1 miscompiling drivers/char/i8k.c (again) hwmon: (w83795) Check for BEEP pin availability hwmon: (w83795) Clear intrusion alarm immediately hwmon: (w83795) Read the intrusion state properly hwmon: (w83795) Print the actual temperature channels as sources hwmon: (w83795) List all usable temperature sources hwmon: (w83795) Expose fan control method hwmon: (w83795) Fix fan control mode attributes hwmon: (lm95241) Check validity of input values hwmon: Change mail address of Hans J. Koch PCI: sysfs: fix printk warnings GFS2: Fix inode deallocation race ...	2010-11-16 11:06:22 -08:00
Linus Torvalds	e5c13537b0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: sysfs: fix printk warnings PCI: fix pci_bus_alloc_resource() hang, prefer positive decode PCI: read current power state at enable time PCI: fix size checks for mmap() on /proc/bus/pci files x86/PCI: coalesce overlapping host bridge windows PCI hotplug: ibmphp: Add check to prevent reading beyond mapped area	2010-11-15 14:01:33 -08:00
Tadeusz Struk	0bd82f5f63	crypto: aesni-intel - RFC4106 AES-GCM Driver Using Intel New Instructions This patch adds an optimized RFC4106 AES-GCM implementation for 64-bit kernels. It supports 128-bit AES key size. This leverages the crypto AEAD interface type to facilitate a combined AES & GCM operation to be implemented in assembly code. The assembly code leverages Intel(R) AES New Instructions and the PCLMULQDQ instruction. Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com> Signed-off-by: Erdinc Ozturk <erdinc.ozturk@intel.com> Signed-off-by: James Guilford <james.guilford@intel.com> Signed-off-by: Wajdi Feghali <wajdi.k.feghali@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-11-13 21:47:55 +09:00
Linus Torvalds	891cbd30ef	Merge branch 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: do not release any memory under 1M in domain 0 xen: events: do not unmask event channels on resume xen: correct size of level2_kernel_pgt	2010-11-12 16:01:55 -08:00
Linus Torvalds	b5c5510436	Merge branch 'stable/xen-pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/xen-pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: MAINTAINERS: Mark XEN lists as moderated xen-pcifront: fix PCI reference leak xen-pcifront: Remove duplicate inclusion of headers. xen: fix memory leak in Xen PCI MSI/MSI-X allocator. MAINTAINERS: Update mailing list name for Xen pieces.	2010-11-12 15:54:39 -08:00
Ian Campbell	7e77506a59	xen: implement XENMEM_machphys_mapping This hypercall allows Xen to specify a non-default location for the machine to physical mapping. This capability is used when running a 32 bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to exactly the size required. [ Impact: add Xen hypercall definitions ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-11-12 15:00:06 -08:00
Linus Torvalds	25a34554d6	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pvclock: Remove leftover scale_delta() function x86, apic: Remove double #include x86: Adjust section annotations in AMD Fam10 MMCONF enabling code x86, UV: Update node controller MMRs x86: Remove unnecessary casts of void ptr returning alloc function return values x86: Address gcc4.6 "set but not used" warnings in apic.h x86, mm: Fix section mismatch in tlb.c	2010-11-12 08:40:23 -08:00
Frederic Weisbecker	6c0aca288e	x86: Ignore trap bits on single step exceptions When a single step exception fires, the trap bits, used to signal hardware breakpoints, are in a random state. These trap bits might be set if another exception will follow, like a breakpoint in the next instruction, or a watchpoint in the previous one. Or there can be any junk there. So if we handle these trap bits during the single step exception, we are going to handle an exception twice, or we are going to handle junk. Just ignore them in this case. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332 Reported-by: Michael Stefaniuc <mstefani@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Alexandre Julliard <julliard@winehq.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: All since 2.6.33.x <stable@kernel.org>	2010-11-12 14:51:01 +01:00
Dirk Brandewie	37bc9f5078	x86: Ce4100: Add reboot_fixup() for CE4100 This patch adds the CE4100 reboot fixup to reboot_fixups_32.c [ tglx: Moved PCI id to reboot_fixups_32.c ] Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> LKML-Reference: <5bdcfb4f0206fa721570504e95659a03b815bc5e.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-12 00:45:41 +01:00
Dirk Brandewie	91d8037f56	ce4100: Add PCI register emulation for CE4100 This patch provides access methods for PCI registers that mis-behave on the CE4100. Each register can be assigned a private init, read and write routine. The exception to this is the bridge device. The bridge device is the only device on bus zero (0) that requires any fixup so it is a special case. [ tglx: minor coding style cleanups, __init annotation and simplification of ce4100_conf_read/write ] Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> LKML-Reference: <40b6751381c2275dc359db5a17989cce22ad8db7.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-12 00:45:41 +01:00
Thomas Gleixner	c751e17b53	x86: Add CE4100 platform support Add CE4100 platform support. CE4100 needs early setup like moorestown. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> LKML-Reference: <94720fd7f5564a12ebf202cf2c4f4c0d619aab35.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-12 00:45:41 +01:00
Stefano Stabellini	e060e7af98	xen: set vma flag VM_PFNMAP in the privcmd mmap file_op Set VM_PFNMAP in the privcmd mmap file_op, rather than later in xen_remap_domain_mfn_range when it is too late because vma_wants_writenotify has already been called and vm_page_prot has already been modified. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-11 12:37:43 -08:00
Bjorn Helgaas	4723d0f2f9	x86/PCI: coalesce overlapping host bridge windows Some BIOSes provide PCI host bridge windows that overlap, e.g., pci_root PNP0A03:00: host bridge window [mem 0xb0000000-0xffffffff] pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xdfffffff] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xffffffff] If we simply insert these as children of iomem_resource, the second window fails because it conflicts with the first, and the third is inserted as a child of the first, i.e., b0000000-ffffffff PCI Bus 0000:00 f0000000-ffffffff PCI Bus 0000:00 When we claim PCI device resources, this can cause collisions like this if we put them in the first window: pci 0000:00:01.0: address space collision: [mem 0xff300000-0xff4fffff] conflicts with PCI Bus 0000:00 [mem 0xf0000000-0xffffffff] Host bridge windows are top-level resources by definition, so it doesn't make sense to make the third window a child of the first. This patch coalesces any host bridge windows that overlap. For the example above, the result is this single window: pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xffffffff] This fixes a 2.6.34 regression. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=17011 Reported-and-tested-by: Anisse Astier <anisse@astier.eu> Reported-and-tested-by: Pramod Dematagoda <pmd.lotr.gandalf@gmail.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-11-11 09:34:31 -08:00
Feng Tang	6f207e9bb4	x86: mrst: Set vRTC's IRQ to level trigger type When setting up the mpc_intsrc structure for vRTC's IRQ, we need to set its irqflag to level trigger, otherwise it will be taken as edge triggered and the vRTC IRQ will fire only once, as there is never a EOI issued from the IA core for it. The original code worked in previous kernel. This is because it was configured to level trigger type by luck. It fell into the default PCI trigger category which is level triggered. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101111155019.12924.569.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 17:43:18 +01:00
Vinod Koul	86071535f8	x86: mrst: Add audio driver bindings This patch adds the sound card bindings for Moorestown (pmic_audio) and the Medfield platform (msic_audio) as IPC devices. This ensures they will be created at the right time. Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101110174044.11340.78008.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:28 +01:00
Feng Tang	0146f26145	rtc: Add drivers/rtc/rtc-mrst.c Provide the standard kernel rtc driver interface on top of the vrtc layer added in the previous patch. Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <20101110172911.3311.20593.stgit@localhost.localdomain> [Fixed swapped arguments on IPC] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> [Cleaned up and the device creation moved to arch/x86/platform] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:27 +01:00
Feng Tang	7309282c90	x86: mrst: Add vrtc driver which serves as a wall clock device Moorestown platform doesn't have a m146818 RTC device like traditional x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides some basic RTC functions like get/set time. vrtc serves as the only wall clock device on Moorestown platform. [ tglx: Changed the exports to _GPL ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101110172837.3311.40483.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:27 +01:00
Alek Du	cfb505a7eb	x86: mrst: Add Moorestown specific reboot/shutdown support Moorestowns needs to use a special IPC command to reboot or shutdown the platform. Signed-off-by: Alek Du <alek.du@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101110164928.6365.94243.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:27 +01:00
Steven Rostedt	b590854853	tracing: Force arch_local_irq_* notrace for paravirt When running ktest.pl randconfig tests, I would sometimes trigger a lockdep annotation bug (possible reason: unannotated irqs-on). This triggering happened right after function tracer self test was executed. After doing a config bisect I found that this was caused with having function tracer, paravirt guest, prove locking, and rcu torture all enabled. The rcu torture just enhanced the likelyhood of triggering the bug. Prove locking was needed, since it was the thing that was bugging. Function tracer would trace and disable interrupts in all sorts of funny places. paravirt guest would turn arch_local_irq_* into functions that would be traced. Besides the fact that tracing arch_local_irq_* is just a bad idea, this is what is happening. The bug happened simply in the local_irq_restore() code: if (raw_irqs_disabled_flags(flags)) { \ raw_local_irq_restore(flags); \ trace_hardirqs_off(); \ } else { \ trace_hardirqs_on(); \ raw_local_irq_restore(flags); \ } \ The raw_local_irq_restore() was defined as arch_local_irq_restore(). Now imagine, we are about to enable interrupts. We go into the else case and call trace_hardirqs_on() which tells lockdep that we are enabling interrupts, so it sets the current->hardirqs_enabled = 1. Then we call raw_local_irq_restore() which calls arch_local_irq_restore() which gets traced! Now in the function tracer we disable interrupts with local_irq_save(). This is fine, but flags is stored that we have interrupts disabled. When the function tracer calls local_irq_restore() it does it, but this time with flags set as disabled, so we go into the if () path. This keeps interrupts disabled and calls trace_hardirqs_off() which sets current->hardirqs_enabled = 0. When the tracer is finished and proceeds with the original code, we enable interrupts but leave current->hardirqs_enabled as 0. Which now breaks lockdeps internal processing. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-11-10 22:29:49 -05:00
Ian Campbell	9ec23a7f6d	xen: do not release any memory under 1M in domain 0 We already deliberately setup a 1-1 P2M for the region up to 1M in order to allow code which assumes this region is already mapped to work without having to convert everything to ioremap. Domain 0 should not return any apparently unused memory regions (reserved or otherwise) in this region to Xen since the e820 may not accurately reflect what the BIOS has stashed in this region. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-10 17:19:25 -08:00
Kees Cook	6036f373ea	x86, cpu: Only CPU features determine NX capabilities Fix the NX feature boot warning when NX is missing to correctly reflect that BIOSes cannot disable NX now. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-5-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:43:15 -08:00
Kees Cook	ebba638ae7	x86, cpu: Call verify_cpu during 32bit CPU startup The XD_DISABLE-clearing side-effect needs to happen for both 32bit and 64bit, but the 32bit init routines were not calling verify_cpu() yet. This adds that call to gain the side-effect. The longmode/SSE tests being performed in verify_cpu() need to happen very early for 64bit but not for 32bit. Instead of including it in two places for 32bit, we can just include it once in arch/x86/kernel/head_32.S. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-4-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:43:09 -08:00
Kees Cook	ae84739c27	x86, cpu: Clear XD_DISABLED flag on Intel to regain NX Intel CPUs have an additional MSR bit to indicate if the BIOS was configured to disable the NX cpu feature. This bit was traditionally used for operating systems that did not understand how to handle the NX bit. Since Linux understands this, this BIOS flag should be ignored by default. In a review[1] of reported hardware being used by Ubuntu bug reporters, almost 10% of systems had an incorrectly configured BIOS, leaving their systems unable to use the NX features of their CPU. This change will clear the MSR_IA32_MISC_ENABLE_XD_DISABLE bit so that NX cannot be inappropriately controlled by the BIOS on Intel CPUs. If, under very strange hardware configurations, NX actually needs to be disabled, "noexec=off" can be used to restore the prior behavior. [1] http://www.outflux.net/blog/archives/2010/02/18/data-mining-for-nx-bit/ Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-3-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:42:54 -08:00
Kees Cook	c5cbac6942	x86, cpu: Rename verify_cpu_64.S to verify_cpu.S The code is 32bit already, and can be used in 32bit routines. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-2-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:42:42 -08:00
Peter Zijlstra	034c6efa46	perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation Jasper suggested we use the zeroing capability of the allocators instead of calling memset ourselves. Add node affinity while we're at it. Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 22:58:40 +01:00
Borislav Petkov	c7657ac0c3	x86, microcode, AMD: Cleanup code a bit get_ucode_data is a memcpy() wrapper which always returns 0. Move it into the header and make it an inline. Remove all code checking its return value and turn it into a void. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-10 14:54:54 +01:00
Jesper Juhl	1ea6be212e	x86, microcode, AMD: Replace vmalloc+memset with vzalloc We don't have to do memset() ourselves after vmalloc() when we have vzalloc(), so change that in arch/x86/kernel/microcode_amd.c::get_next_ucode(). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-10 14:48:57 +01:00
Kusanagi Kouichi	1f523bf367	x86, pvclock: Remove leftover scale_delta() function Commit 92580d64e16402762e2acc3022f065397c780425 ("x86: pvclock: Move scale_delta into common header") forgot to remove scale_delta. Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Cc: Zachary Amsden <zamsden@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Glauber Costa <glommer@redhat.com> LKML-Reference: <20101105110444.BAF6D6FC03B@msa105.auone-net.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:32:15 +01:00
Jesper Juhl	2a8dcbd6cd	x86, apic: Remove double #include Remove the second <asm/atomic.h> inclusion. Signed-off-by: Jesper Juhl <jj@chaosbits.net> LKML-Reference: <alpine.LNX.2.00.1011072253360.26247@swampdragon.chaosbits.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:21:16 +01:00
Jan Beulich	2f62bf7d23	x86: Adjust section annotations in AMD Fam10 MMCONF enabling code check_enable_amd_mmconf_dmi() gets called only for the BSP, hence everything hanging off of it can be __init*. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CD2DE1E0200007800020990@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:08:26 +01:00
Jack Steiner	62b0cfc240	x86, UV: Update node controller MMRs A new version of the SGI UV hub node controller is being developed. A few of the MMRs (control registers) that exist on the current hub no longer exist on the new hub. Fortunately, there are alternate MMRs that are are functionally equivalent and that exist on both hubs. This patch changes the UV code to use MMRs that exist in BOTH versions of the hub node controller. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20101106204056.GA27584@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:06:38 +01:00
Jesper Juhl	8e5e9521c1	x86: Remove unnecessary casts of void ptr returning alloc function return values The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from arch/x86. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: trivial@kernel.org Cc: amd64-microcode@amd64.org Cc: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <alpine.LNX.2.00.1011082310220.23697@swampdragon.chaosbits.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 09:13:00 +01:00
Andi Kleen	0059b2436a	x86: Address gcc4.6 "set but not used" warnings in apic.h native_apic_msr_read() and x2apic_enabled() use rdmsr(msr, low, high), but only use the low part. gcc4.6 complains about this: .../apic.h:144:11: warning: variable 'high' set but not used [-Wunused-but-set-variable] rdmsr() is just a wrapper around rdmsrl() which splits the 64bit value into low and high, so using rdmsrl() directly solves this. [tglx: Changed the variables to u64 as suggested by Cyrill. It's less confusing and has no code impact as this is 64bit only anyway. Massaged changelog as well. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: x86@kernel.org Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1289251229-19589-1-git-send-email-andi@firstfloor.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-09 18:40:30 +01:00
Jacob Pan	7f05dec3dd	x86: mrst: Parse SFI timer table for all timer configs Penwell has APB timer based watchdog timers, it requires platform code to parse SFI MTMR tables in order to claim its timer. This patch will always parse SFI MTMR regardless of system timer configuration choices. Otherwise, SFI MTMR table may not get parsed if running on Medfield with always-on local APIC timers and constant TSC. Watchdog timer driver will then not get a timer to use. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101109112800.20591.10802.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-09 14:45:52 +01:00
Feng Tang	1da4b1c6a4	x86/mrst: Add SFI platform device parsing code SFI provides a series of tables. These describe the platform devices present including SPI and I²C devices, as well as various sensors, keypads and other glue as well as interfaces provided via the SCU IPC mechanism (intel_scu_ipc.c) This patch is a merge of the core elements and relevant fixes from the Intel development code by Feng, Alek, myself into a single coherent patch for upstream submission. It provides the needed infrastructure to register I2C, SPI and platform devices described by the tables, as well as handlers for some of the hardware already supported in kernel. The 0.8 firmware also provides GPIO tables. Devices are created at boot time or if they are SCU dependant at the point an SCU is discovered. The existing Linux device mechanisms will then handle the device binding. At an abstract level this is an SFI to Linux device translator. Device/platform specific setup/glue is in this file. This is done so that the drivers for the generic I²C and SPI bus devices remain cross platform as they should. (Updated from RFC version to correct the emc1403 name used by the firmware and a wrongly used #define) Signed-off-by: Alek Du <alek.du@linux.intel.com> LKML-Reference: <20101109112158.20013.6158.stgit@localhost.localdomain> [Clean ups, removal of 0.7 support] Signed-off-by: Feng Tang <feng.tang@linux.intel.com> [Clean ups] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-09 14:45:52 +01:00
Jiri Slaby	07cf2a64c2	xen: fix memory leak in Xen PCI MSI/MSI-X allocator. Stanse found that xen_setup_msi_irqs leaks memory when xen_allocate_pirq fails. Free the memory in that fail path. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xensource.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org	2010-11-08 11:30:00 -05:00
Jan Kiszka	453d9c57e2	KVM: x86: Issue smp_call_function_many with preemption disabled smp_call_function_many is specified to be called only with preemption disabled. Fulfill this requirement. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-05 14:42:27 -02:00
Vasiliy Kulikov	97e69aa62f	KVM: x86: fix information leak to userland Structures kvm_vcpu_events, kvm_debugregs, kvm_pit_state2 and kvm_clock_data are copied to userland with some padding and reserved fields unitialized. It leads to leaking of contents of kernel stack memory. We have to initialize them to zero. In patch v1 Jan Kiszka suggested to fill reserved fields with zeros instead of memset'ting the whole struct. It makes sense as these fields are explicitly marked as padding. No more fields need zeroing. KVM-Stable-Tag. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-05 14:42:27 -02:00
Marcelo Tosatti	eb45fda45f	KVM: MMU: fix rmap_remove on non present sptes drop_spte should not attempt to rmap_remove a non present shadow pte. This fixes a BUG_ON seen on kvm-autotest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Lucas Meneghel Rodrigues <lmr@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-11-05 14:42:26 -02:00
Michael S. Tsirkin	edde99ce05	KVM: Write protect memory after slot swap I have observed the following bug trigger: 1. userspace calls GET_DIRTY_LOG 2. kvm_mmu_slot_remove_write_access is called and makes a page ro 3. page fault happens and makes the page writeable fault is logged in the bitmap appropriately 4. kvm_vm_ioctl_get_dirty_log swaps slot pointers a lot of time passes 5. guest writes into the page 6. userspace calls GET_DIRTY_LOG At point (5), bitmap is clean and page is writeable, thus, guest modification of memory is not logged and GET_DIRTY_LOG returns an empty bitmap. The rule is that all pages are either dirty in the current bitmap, or write-protected, which is violated here. It seems that just moving kvm_mmu_slot_remove_write_access down to after the slot pointer swap should fix this bug. KVM-Stable-Tag. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-11-05 14:42:25 -02:00
Uwe Kleine-König	b595076a18	tree-wide: fix comment/printk typos "gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-11-01 15:38:34 -04:00
Rakib Mullick	cf38d0ba7e	x86, mm: Fix section mismatch in tlb.c Mark tlb_cpuhp_notify as __cpuinit. It's basically a callback function, which is called from __cpuinit init_smp_flash(). So - it's safe. We were warned by the following warning: WARNING: arch/x86/mm/built-in.o(.text+0x356d): Section mismatch in reference from the function tlb_cpuhp_notify() to the function .cpuinit.text:calculate_tlb_offset() The function tlb_cpuhp_notify() references the function __cpuinit calculate_tlb_offset(). This is often because tlb_cpuhp_notify lacks a __cpuinit annotation or the annotation of calculate_tlb_offset is wrong. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <AANLkTinWQRG=HA9uB3ad0KAqRRTinL6L_4iKgF84coph@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-01 10:09:07 +01:00
Linus Torvalds	f02a38d86a	Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: jump label: Add work around to i386 gcc asm goto bug x86, ftrace: Use safe noops, drop trap test jump_label: Fix unaligned traps on sparc. jump label: Make arch_jump_label_text_poke_early() optional jump label: Fix error with preempt disable holding mutex oprofile: Remove deprecated use of flush_scheduled_work() oprofile: Fix the hang while taking the cpu offline jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex jump label: Fix module __init section race * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Check irq_remapped instead of remapping_enabled in destroy_irq()	2010-10-30 11:43:26 -07:00
Yinghai Lu	7b79462a20	x86: Check irq_remapped instead of remapping_enabled in destroy_irq() Russ Anderson reported: \| There is a regression that is causing a NULL pointer dereference \| in free_irte when shutting down xpc. git bisect narrowed it down \| to git commit d585d06(intr_remap: Simplify the code further), which \| changed free_irte(). Reverse applying the patch fixes the problem. We need to use irq_remapped() for each irq instead of checking only intr_remapping_enabled as there might be non remapped irqs even when remapping is enabled. [ tglx: use cfg instead of retrieving it again. Massaged changelog ] Reported-bisected-and-tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4CCBD511.40607@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-30 10:28:31 +02:00
Linus Torvalds	2d10d8737c	Merge branches 'x86-fixes-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, alternative: Call stop_machine_text_poke() on all cpus x86-32: Restore irq stacks NUMA-aware allocations x86, memblock: Fix early_node_mem with big reserved region. * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, uv: More Westmere support on SGI UV x86, uv: Enable Westmere support on SGI UV	2010-10-29 18:58:00 -07:00
Jason Baron	404ba5d7bb	x86, alternative: Call stop_machine_text_poke() on all cpus Currently, text_poke_smp() passes a NULL as the third argument to __stop_machine(), which will only run stop_machine_text_poke() on 1 cpu. Change NULL -> cpu_online_mask, as stop_machine_text_poke() is intended to be run on all cpus. I actually didn't notice any problems with stop_machine_text_poke() only being called on 1 cpu, but found this via code inspection. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <20101028152026.GB2875@redhat.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-29 16:42:58 -07:00
Ian Campbell	a2d771c036	xen: correct size of level2_kernel_pgt sizeof(pmd_t *) is 4 bytes on 32-bit PAE leading to an allocation of only 2048 bytes. The correct size is sizeof(pmd_t) giving us a full page allocation. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-29 12:23:57 -07:00
Linus Torvalds	1e431a9d64	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb,ppc: Individual register get/set for ppc kgdbts: prevent re-entry to kgdbts before it unregisters debug_core,x86,blackfin: Clean up hw debug disable API kdb: Fix early debugging crash regression kgdb,arm: fix register dump kdb: fix per_cpu command to remove supress mask kdb: Add kdb kernel module sample	2010-10-29 11:49:38 -07:00
Steven Rostedt	45f81b1c96	jump label: Add work around to i386 gcc asm goto bug On i386 (not x86_64) early implementations of gcc would have a bug with asm goto causing it to produce code like the following: (This was noticed by Peter Zijlstra) 56 pushl 0 67 nopl jmp 0x6f popl jmp 0x8c 6f mov test je 0x8c 8c mov call *(%esp) The jump added in the asm goto skipped over the popl that matched the pushl 0, which lead up to a quick crash of the system when the jump was enabled. The nopl is defined in the asm goto () statement and when tracepoints are enabled, the nop changes to a jump to the label that was specified by the asm goto. asm goto is suppose to tell gcc that the code in the asm might jump to an external label. Here gcc obviously fails to make that work. The bug report for gcc is here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226 The bug only appears on x86 when not compiled with -maccumulate-outgoing-args. This option is always set on x86_64 and it is also the work around for a function graph tracer i386 bug. (See commit: `746357d6a5`) This explains why the bug only showed up on i386 when function graph tracer was not enabled. This patch now adds a CONFIG_JUMP_LABEL option that is default off instead of using jump labels by default. When jump labels are enabled, the -maccumulate-outgoing-args will be used (causing a slightly larger kernel image on i386). This option will exist until we have a way to detect if the gcc compiler in use is safe to use on all configurations without the work around. Note, there exists such a test, but for now we will keep the enabling of jump label as a manual option. Archs that know the compiler is safe with asm goto, may choose to select JUMP_LABEL and enable it by default. Reported-by: Ingo Molnar <mingo@elte.hu> Cause-discovered-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Baron <jbaron@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Miller <davem@davemloft.net> Cc: Richard Henderson <rth@redhat.com> LKML-Reference: <1288028746.3673.11.camel@laptop> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-29 14:45:29 -04:00
Dongdong Deng	d7ba979d45	debug_core,x86,blackfin: Clean up hw debug disable API The kgdb_disable_hw_debug() was an architecture specific function for disabling all hardware breakpoints on a per cpu basis when entering the debug core. This patch will remove the weak function kdbg_disable_hw_debug() and change it into a call back which lives with the rest of hw breakpoint call backs in struct kgdb_arch. Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-10-29 13:14:41 -05:00
H. Peter Anvin	2d1d7126bb	x86, ftrace: Use safe noops, drop trap test Always use a safe 5-byte noop sequence. Drop the trap test, since it is known to return false negatives on some virtualization platforms on 32 bits. The resulting code is both simpler and safer. Cc: Daniel Drake <dsd@laptop.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-29 13:07:59 -04:00
Eric Dumazet	5c1eb08936	x86-32: Restore irq stacks NUMA-aware allocations Commit `22d4cd4c4d` ("Allocate irq stacks seperate from percpu area") removed NUMA affinity of IRQ stacks as side-effect of the fix. Using alloc_pages_node() instead of __get_free_pages() is safe, even if the target node has no available LOWMEM pages : alloc_pages_node() fallbacks to another node. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Brian Gerst <brgerst@gmail.com> Cc: tj@kernel.org Cc: torvalds@linux-foundation.org Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1288276854.2649.607.camel@edumazet-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-29 08:17:07 +02:00
Russ Anderson	0520bd8438	x86, uv: More Westmere support on SGI UV Enable Westmere support for all APIC modes on SGI UV. Signed-off-by: Russ Anderson <rja@sgi.com> LKML-Reference: <20101028224132.GB15804@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-28 22:38:07 -07:00
Linus Torvalds	18cb657ca1	Merge branch 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: register xen pci notifier xen: initialize cpu masks for pv guests in xen_smp_init xen: add a missing #include to arch/x86/pci/xen.c xen: mask the MTRR feature from the cpuid xen: make hvc_xen console work for dom0. xen: add the direct mapping area for ISA bus access xen: Initialize xenbus for dom0. xen: use vcpu_ops to setup cpu masks xen: map a dummy page for local apic and ioapic in xen_set_fixmap xen: remap MSIs into pirqs when running as initial domain xen: remap GSIs as pirqs when running as initial domain xen: introduce XEN_DOM0 as a silent option xen: map MSIs into pirqs xen: support GSI -> pirq remapping in PV on HVM guests xen: add xen hvm acpi_register_gsi variant acpi: use indirect call to register gsi in different modes xen: implement xen_hvm_register_pirq xen: get the maximum number of pirqs from xen xen: support pirq != irq * 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits) X86/PCI: Remove the dependency on isapnp_disable. xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright. x86: xen: Sanitse irq handling (part two) swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it. MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer. xen/pci: Request ACS when Xen-SWIOTLB is activated. xen-pcifront: Xen PCI frontend driver. xenbus: prevent warnings on unhandled enumeration values xenbus: Xen paravirtualised PCI hotplug support. xen/x86/PCI: Add support for the Xen PCI subsystem x86: Introduce x86_msi_ops msi: Introduce default_[teardown\|setup]_msi_irqs with fallback. x86/PCI: Export pci_walk_bus function. x86/PCI: make sure _PAGE_IOMAP it set on pci mappings x86/PCI: Clean up pci_cache_line_size xen: fix shared irq device passthrough xen: Provide a variant of xen_poll_irq with timeout. xen: Find an unbound irq number in reverse order (high to low). xen: statically initialize cpu_evtchn_mask_p ... Fix up trivial conflicts in drivers/pci/Makefile	2010-10-28 17:11:17 -07:00
Linus Torvalds	51399a3919	Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (38 commits) kbuild: convert `arch/tile' to the kconfig mainmenu upgrade README: cite nconfig Revert "kconfig: Temporarily disable dependency warnings" kconfig: Use PATH_MAX instead of 128 for path buffer sizes. kconfig: Fix realloc usage() kconfig: Propagate const kconfig: Don't go out from read config loop when you read new symbol kconfig: fix menuconfig on debian lenny kbuild: migrate all arch to the kconfig mainmenu upgrade kconfig: expand file names kconfig: use the file's name of sourced file kconfig: constify file name kconfig: don't emit warning upon rootmenu's prompt redefinition kconfig: replace KERNELVERSION usage by the mainmenu's prompt kconfig: delay gconf window initialization kconfig: expand by default the rootmenu's prompt kconfig: add a symbol string expansion helper kconfig: regen parser kconfig: implement the `mainmenu' directive kconfig: allow PACKAGE to be defined on the compiler's command-line ... Fix up trivial conflict in arch/mn10300/Kconfig	2010-10-28 16:16:39 -07:00
Yinghai Lu	419db274be	x86, memblock: Fix early_node_mem with big reserved region. Xen can reserve huge amounts of memory for pre-ballooning, but that still shows as RAM in the e820 memory map. early_node_mem could not find range because of start/end adjusting, and will go through the fallback path. However, the fallback patch is still using memblock_x86_find_range_node(), and it is partially top-down because it go through active_range entries from low to high. Let's use memblock_find_in_range instead memblock_x86_find_range_node. So get real top down in fallback path. We may still need to make memblock_x86_find_range_node to do overall top_down work. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CC9A9C9.8020700@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-28 15:52:36 -07:00
Linus Torvalds	2d3b07c07b	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Move olpc to platform x86: Move uv to platform x86: Move mrst to platform x86: Move scx200 to platform x86: Move visws to platform x86: Move efi to platform x86: Move sfi to platform x86: Add platform directory	2010-10-28 12:25:42 -07:00
Linus Torvalds	e9f29c9a56	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits) x86: allocate space within a region top-down x86: update iomem_resource end based on CPU physical address capabilities x86/PCI: allocate space from the end of a region, not the beginning PCI: allocate bus resources from the top down resources: support allocating space within a region from the top down resources: handle overflow when aligning start of available area resources: ensure callback doesn't allocate outside available space resources: factor out resource_clip() to simplify find_resource() resources: add a default alignf to simplify find_resource() x86/PCI: MMCONFIG: fix region end calculation PCI: Add support for polling PME state on suspended legacy PCI devices PCI: Export some PCI PM functionality PCI: fix message typo PCI: log vendor/device ID always PCI: update Intel chipset names and defines PCI: use new ccflags variable in Makefile PCI: add PCI_MSIX_TABLE/PBA defines PCI: add PCI vendor id for STmicroelectronics x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs PCI: OLPC: Only enable PCI configuration type override on XO-1 ...	2010-10-28 11:59:52 -07:00
Linus Torvalds	a042e26137	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits) perf python scripting: Add futex-contention script perf python scripting: Fixup cut'n'paste error in sctop script perf scripting: Shut up 'perf record' final status perf record: Remove newline character from perror() argument perf python scripting: Support fedora 11 (audit 1.7.17) perf python scripting: Improve the syscalls-by-pid script perf python scripting: print the syscall name on sctop perf python scripting: Improve the syscalls-counts script perf python scripting: Improve the failed-syscalls-by-pid script kprobes: Remove redundant text_mutex lock in optimize x86/oprofile: Fix uninitialized variable use in debug printk tracing: Fix 'faild' -> 'failed' typo perf probe: Fix format specified for Dwarf_Off parameter perf trace: Fix detection of script extension perf trace: Use $PERF_EXEC_PATH in canned report scripts perf tools: Document event modifiers perf tools: Remove direct slang.h include perf_events: Fix for transaction recovery in group_sched_in() perf_events: Revert: Fix transaction recovery in group_sched_in() perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations ...	2010-10-27 18:48:00 -07:00
Linus Torvalds	17bb51d56c	Merge branch 'akpm-incoming-2' * akpm-incoming-2: (139 commits) epoll: make epoll_wait() use the hrtimer range feature select: rename estimate_accuracy() to select_estimate_accuracy() Remove duplicate includes from many files ramoops: use the platform data structure instead of module params kernel/resource.c: handle reinsertion of an already-inserted resource kfifo: fix kfifo_alloc() to return a signed int value w1: don't allow arbitrary users to remove w1 devices alpha: remove dma64_addr_t usage mips: remove dma64_addr_t usage sparc: remove dma64_addr_t usage fuse: use release_pages() taskstats: use real microsecond granularity for CPU times taskstats: split fill_pid function taskstats: separate taskstats commands delayacct: align to 8 byte boundary on 64-bit systems delay-accounting: reimplement -c for getdelays.c to report information on a target command namespaces Kconfig: move namespace menu location after the cgroup namespaces Kconfig: remove the cgroup device whitelist experimental tag namespaces Kconfig: remove pointless cgroup dependency namespaces Kconfig: make namespace a submenu ...	2010-10-27 18:42:52 -07:00
Linus Torvalds	0671b7674f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: percpu: Remove the multi-page alignment facility x86-32: Allocate irq stacks seperate from percpu area x86-32, mm: Remove duplicated #include x86, printk: Get rid of <0> from stack output x86, kexec: Make sure to stop all CPUs before exiting the kernel x86/vsmp: Eliminate kconfig dependency warning	2010-10-27 18:38:55 -07:00
Zimny Lech	61d8e11e51	Remove duplicate includes from many files Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:18 -07:00
Namhyung Kim	eb5a369931	ptrace: cleanup arch_ptrace() on x86 Remove checking @addr less than 0 because @addr is now unsigned and use new udescp variable in order to remove unnecessary castings. [akpm@linux-foundation.org: fix unused variable 'udescp'] Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:10 -07:00
Namhyung Kim	9b05a69e05	ptrace: change signature of arch_ptrace() Fix up the arguments to arch_ptrace() to take account of the fact that @addr and @data are now unsigned long rather than long as of a preceding patch in this series. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: <linux-arch@vger.kernel.org> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:10 -07:00
Peter Zijlstra	20273941f2	mm: fix race in kunmap_atomic() Christoph reported a nice splat which illustrated a race in the new stack based kmap_atomic implementation. The problem is that we pop our stack slot before we're completely done resetting its state -- in particular clearing the PTE (sometimes that's CONFIG_DEBUG_HIGHMEM). If an interrupt happens before we actually clear the PTE used for the last slot, that interrupt can reuse the slot in a dirty state, which triggers a BUG in kmap_atomic(). Fix this by introducing kmap_atomic_idx() which reports the current slot index without actually releasing it and use that to find the PTE and delay the _pop() until after we're completely done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Christoph Hellwig <hch@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:05 -07:00
Brian Gerst	22d4cd4c4d	x86-32: Allocate irq stacks seperate from percpu area The percpu allocator cannot handle alignments larger than one page. Allocate the irq stacks seperately, and only keep the pointers as percpu data. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: tj@kernel.org LKML-Reference: <1288158182-1753-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-27 17:31:42 +02:00
Thomas Gleixner	8654b1c2de	x86: Move olpc to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andres Salomon <dilinger@queued.net>	2010-10-27 17:22:16 +02:00
Thomas Gleixner	329b84e42e	x86: Move uv to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Travis <travis@sgi.com>	2010-10-27 14:30:02 +02:00
Thomas Gleixner	9694d4afc1	x86: Move mrst to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	3b3da9d25a	x86: Move scx200 to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	c4e72ad6bb	x86: Move visws to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	b17ed48040	x86: Move efi to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Huang Ying <ying.huang@intel.com>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	937f961a65	x86: Move sfi to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	3adbb7f4a3	x86: Add platform directory x86 has finally arrived in the embedded nightmare and will rapidly grow SoC platform support in various flavours. So we need a place for the platform support files. That also allows us to clean up the dumpground which arch/x86/kernel has become over time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-27 14:30:01 +02:00
Linus Torvalds	520045db94	Merge branches 'upstream/xenfs' and 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/xenfs' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen/privcmd: make privcmd visible in domU xen/privcmd: move remap_domain_mfn_range() to core xen code and export. privcmd: MMAPBATCH: Fix error handling/reporting xenbus: export xen_store_interface for xenfs xen/privcmd: make sure vma is ours before doing anything to it xen/privcmd: print SIGBUS faults xen/xenfs: set_page_dirty is supposed to return true if it dirties xen/privcmd: create address space to allow writable mmaps xen: add privcmd driver xen: add variable hypercall caller xen: add xen_set_domain_pte() xen: add /proc/xen/xsd_{kva,port} to xenfs * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (29 commits) xen: include xen/xen.h for definition of xen_initial_domain() xen: use host E820 map for dom0 xen: correctly rebuild mfn list list after migration. xen: improvements to VIRQ_DEBUG output xen: set up IRQ before binding virq to evtchn xen: ensure that all event channels start off bound to VCPU 0 xen/hvc: only notify if we actually sent something xen: don't add extra_pages for RAM after mem_end xen: add support for PAT xen: make sure xen_max_p2m_pfn is up to date xen: limit extra memory to a certain ratio of base xen: add extra pages for E820 RAM regions, even if beyond mem_end xen: make sure xen_extra_mem_start is beyond all non-RAM e820 xen: implement "extra" memory to reserve space for pages not present at boot xen: Use host-provided E820 map xen: don't map missing memory xen: defer building p2m mfn structures until kernel is mapped xen: add return value to set_phys_to_machine() xen: convert p2m to a 3 level tree xen: make install_p2mtop_page() static ... Fix up trivial conflict in arch/x86/xen/mmu.c, and fix the use of 'reserve_early()' - in the new memblock world order it is now 'memblock_x86_reserve_range()' instead. Pointed out by Jeremy.	2010-10-26 18:20:19 -07:00
Linus Torvalds	474829e875	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (53 commits) ACPI: install ACPI table handler before any dynamic tables being loaded ACPI / PM: Blacklist another machine that needs acpi_sleep=nonvs ACPI: Page based coalescing of I/O remappings optimization ACPI: Convert simple locking to RCU based locking ACPI: Pre-map 'system event' related register blocks ACPI: Add interfaces for ioremapping/iounmapping ACPI registers ACPI: Maintain a list of ACPI memory mapped I/O remappings ACPI: Fix ioremap size for MMIO reads and writes ACPI / Battery: Return -ENODEV for unknown values in get_property() ACPI / PM: Fix reference counting of power resources Subject: [PATCH] ACPICA: Fix Scope() op in module level code ACPI battery: support percentage battery remaining capacity ACPI: Make Embedded Controller command timeout delay configurable ACPI dock: move some functions to .init.text ACPI: thermal: remove unused limit code ACPI: static sleep_states[] and acpi_gts_bfs_check ACPI: remove dead code ACPI: delete dedicated MAINTAINERS entries for ACPI EC and BATTERY drivers ACPI: Only processor needs CPU_IDLE ACPICA: Update version to 20101013 ...	2010-10-26 17:28:37 -07:00
Andrew Morton	ca1cab37d9	workqueues: s/ON_STACK/ONSTACK/ Silly though it is, completions and wait_queue_heads use foo_ONSTACK (COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK, __WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I guess workqueues should do the same thing. s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/ s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/ Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:14 -07:00
Hagen Paul Pfeifer	732eacc054	replace nested max/min macros with {max,min}3 macro Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:12 -07:00
Michel Lespinasse	68da336a14	x86: access_error API cleanup access_error() already takes error_code as an argument, so there is no need for an additional write flag. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:09 -07:00
Michel Lespinasse	d065bd810b	mm: retry page fault when blocking on disk transfer This change reduces mmap_sem hold times that are caused by waiting for disk transfers when accessing file mapped VMAs. It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call site wants mmap_sem to be released if blocking on a pending disk transfer. In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and do_page_fault() will then re-acquire mmap_sem and retry the page fault. It is expected that the retry will hit the same page which will now be cached, and thus it will complete with a low mmap_sem hold time. Tests: - microbenchmark: thread A mmaps a large file and does random read accesses to the mmaped area - achieves about 55 iterations/s. Thread B does mmap/munmap in a loop at a separate location - achieves 55 iterations/s before, 15000 iterations/s after. - We are seeing related effects in some applications in house, which show significant performance regressions when running without this change. [akpm@linux-foundation.org: fix warning & crash] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:09 -07:00
Peter Zijlstra	7a837d1bb7	perf, x86: Fix up kmap_atomic() type Now that the KM_type stuff is history, clean up the compiler warning. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Peter Zijlstra	ece0e2b640	mm: remove pte_map_nested() Since we no longer need to provide KM_type, the whole pte_map_nested() API is now redundant, remove it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Peter Zijlstra	3e4d3af501	mm: stack based kmap_atomic() Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Bjorn Helgaas	1af3c2e45e	x86: allocate space within a region top-down Request that allocate_resource() use available space from high addresses first, rather than the default of using low addresses first. The most common place this makes a difference is when we move or assign new PCI device resources. Low addresses are generally scarce, so it's better to use high addresses when possible. This follows Windows practice for PCI allocation. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-26 15:33:45 -07:00
Bjorn Helgaas	419afdf53c	x86: update iomem_resource end based on CPU physical address capabilities The iomem_resource map reflects the available physical address space. We statically initialize the end to -1, i.e., 0xffffffff_ffffffff, but of course we can only use as much as the CPU can address. This patch updates the end based on the CPU capabilities, so we don't mistakenly allocate space that isn't usable, as we're likely to do when allocating from the top-down. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-26 15:33:44 -07:00
Bjorn Helgaas	dc9887dc02	x86/PCI: allocate space from the end of a region, not the beginning Allocate from the end of a region, not the beginning. For example, if we need to allocate 0x800 bytes for a device on bus 0000:00 given these resources: [mem 0xbff00000-0xdfffffff] PCI Bus 0000:00 [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02 the available space at [mem 0xbff00000-0xbfffffff] is passed to the alignment callback (pcibios_align_resource()). Prior to this patch, we would put the new 0x800 byte resource at the beginning of that available space, i.e., at [mem 0xbff00000-0xbff007ff]. With this patch, we put it at the end, at [mem 0xbffff800-0xbfffffff]. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c41 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-26 15:33:42 -07:00
Russ Anderson	c8f730b1ab	x86, uv: Enable Westmere support on SGI UV Enable Westmere support on SGI UV. The UV initialization code is dependent on the APICID bits. Westmere-EX uses different APIC bit mapping than Nehalem-EX. This code reads the apic shift value from a UV MMR to do the proper bit decoding to determint the pnode. Signed-off-by: Russ Anderson <rja@sgi.com> LKML-Reference: <20101026212728.GB15071@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-26 15:15:28 -07:00
Stefano Stabellini	ea5b8f7393	xen: initialize cpu masks for pv guests in xen_smp_init Pv guests don't have ACPI and need the cpu masks to be set correctly as early as possible so we call xen_fill_possible_map from xen_smp_init. On the other hand the initial domain supports ACPI so in this case we skip xen_fill_possible_map and rely on it. However Xen might limit the number of cpus usable by the domain, so we filter those masks during smp initialization using the VCPUOP_is_up hypercall. It is important that the filtering is done before xen_setup_vcpu_info_placement. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-10-26 20:33:15 +01:00
Linus Torvalds	f1ebdd60cc	Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits) Add _addr_lsb field to ia64 siginfo Fix migration.c compilation on s390 HWPOISON: Remove retry loop for try_to_unmap HWPOISON: Turn addr_valid from bitfield into char HWPOISON: Disable DEBUG by default HWPOISON: Convert pr_debugs to pr_info HWPOISON: Improve comments in memory-failure.c x86: HWPOISON: Report correct address granuality for huge hwpoison faults Encode huge page size for VM_FAULT_HWPOISON errors Fix build error with !CONFIG_MIGRATION hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Clean up __page_set_anon_rmap HWPOISON, hugetlb: fix unpoison for hugepage HWPOISON, hugetlb: soft offlining for hugepage HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED hugetlb: move refcounting in hugepage allocation inside hugetlb_lock HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() hugetlb: hugepage migration core hugetlb: redefine hugepage copy functions hugetlb: add allocate function for hugepage migration ...	2010-10-26 10:13:10 -07:00
Linus Torvalds	4f6876031e	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit. [CPUFREQ] add sampling_down_factor tunable to improve ondemand performance [CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type [CPUFREQ] drivers/cpufreq: Adjust confusing if indentation	2010-10-26 10:00:04 -07:00
Ian Campbell	45263cb099	xen: include xen/xen.h for definition of xen_initial_domain() CC arch/x86/xen/setup.o arch/x86/xen/setup.c: In function 'xen_memory_setup': arch/x86/xen/setup.c:161: error: implicit declaration of function 'xen_initial_domain' Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-25 16:32:48 -07:00
Borislav Petkov	610470ce80	x86-32, mm: Remove duplicated #include `b40827fa72` added an include directive which is needless and is taken care of by a previous one. Remove it. Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <20101025162523.GA4712@a1.tnic> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 19:39:33 +02:00
Ingo Molnar	7d7a48b760	Merge branch 'linus' into x86/urgent Merge reason: We want to queue up a dependent fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 19:38:52 +02:00
Ingo Molnar	0b849ee888	Merge branch 'x86' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent	2010-10-25 19:17:32 +02:00
Borislav Petkov	9afd281a15	x86-32, mm: Remove duplicated include Commit `b40827fa72` ("x86-32, mm: Add an initial page table for core bootstrapping") added an include directive which is needless and is taken care of by a previous one. Remove it. Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-25 10:05:13 -07:00
Robert Richter	eb48c9cb20	apic, amd: Make firmware bug messages more meaningful This improves error messages in case the BIOS was setting up wrong LVT offsets. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-6-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:43 +02:00
Robert Richter	0a17941e71	mce, amd: Remove goto in threshold_create_device() Removing the goto in threshold_create_device(). Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-5-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:43 +02:00
Robert Richter	bbaff08dca	mce, amd: Add helper functions to setup APIC This patch reworks and cleans up mce_amd_feature_init() by introducing helper functions to setup and check the LVT offset. It also fixes line endings in pr_err() calls. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:43 +02:00
Robert Richter	7203a04940	mce, amd: Shorten local variables mci_misc_{hi,lo} Shorten this variables to make later changes more readable. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:42 +02:00
Robert Richter	9c37c9d897	mce, amd: Implement mce_threshold_block_init() helper function This patch adds a helper function for the initial setup of an mce threshold block. The LVT offset is passed as argument. Also making variable threshold_defaults local as it is only used in function mce_amd_feature_init(). Function threshold_restart_bank() is extended to setup the LVT offset, the change is backward compatible. Thus, now there is only a single wrmsrl() to setup the block. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:42 +02:00
Linus Torvalds	fbaab1dc19	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (44 commits) eeepc-wmi: Add cpufv sysfs interface eeepc-wmi: add additional hotkeys panasonic-laptop: Simplify calls to acpi_pcc_retrieve_biosdata panasonic-laptop: Handle errors properly if they happen intel_pmic_gpio: fix off-by-one value range checking IBM Real-Time "SMI Free" mode driver -v7 Add OLPC XO-1 rfkill driver Move hdaps driver to platform/x86 ideapad-laptop: Fix Makefile intel_pmic_gpio: swap the bits and mask args for intel_scu_ipc_update_register ideapad: Add param: no_bt_rfkill ideapad: Change the driver name to ideapad-laptop ideapad: rewrite the sw rfkill set ideapad: rewrite the hw rfkill notify ideapad: use EC command to control camera ideapad: use return value of _CFG to tell if device exist or not ideapad: make sure we bind on the correct device ideapad: check VPC bit before sync rfkill hw status ideapad: add ACPI helpers dell-laptop: Add debugfs support ...	2010-10-25 08:28:13 -07:00
Robert Richter	4cafc4b8d7	Merge branch 'oprofile/core' into oprofile/x86 Conflicts: arch/x86/oprofile/op_model_amd.c Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-25 16:58:34 +02:00
Ingo Molnar	2c78ffeca9	x86/oprofile: Fix uninitialized variable use in debug printk Stephen Rothwell reported this build warning: arch/x86/oprofile/op_model_amd.c: In function 'ibs_eilvt_valid': arch/x86/oprofile/op_model_amd.c:289: warning: 'offset' may be used uninitialized in this function And correctly observed that indeed the variable is used uninitialized in this function. The result of this bug can be a debug printk with a bogus value. Also fix a few more small details that made this function hard to read and which probably contributed to the bug being introduced to begin with: - Use more symmetric error conditions - Remove the !0 obfuscation - Add newlines to the printk output - Remove bogus linebreaks in printk strings and elsewhere Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Robert Richter <robert.richter@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20101025115736.41d51abe.sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 08:46:20 +02:00
Len Brown	38add9b4ba	Merge branches 'bugzilla-15807', 'bugzilla-15979-v2' and 'bugzilla-19162' into release	2010-10-25 02:12:27 -04:00
Linus Torvalds	229aebb873	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing\|ed] => interest[ing\|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c	2010-10-24 13:41:39 -07:00
Linus Torvalds	1765a1fe5d	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits) KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages KVM: Fix signature of kvm_iommu_map_pages stub KVM: MCE: Send SRAR SIGBUS directly KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED KVM: fix typo in copyright notice KVM: Disable interrupts around get_kernel_ns() KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address KVM: MMU: move access code parsing to FNAME(walk_addr) function KVM: MMU: audit: check whether have unsync sps after root sync KVM: MMU: audit: introduce audit_printk to cleanup audit code KVM: MMU: audit: unregister audit tracepoints before module unloaded KVM: MMU: audit: fix vcpu's spte walking KVM: MMU: set access bit for direct mapping KVM: MMU: cleanup for error mask set while walk guest page table KVM: MMU: update 'root_hpa' out of loop in PAE shadow path KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() KVM: x86: Fix constant type in kvm_get_time_scale KVM: VMX: Add AX to list of registers clobbered by guest switch KVM guest: Move a printk that's using the clock before it's ready KVM: x86: TSC catchup mode ...	2010-10-24 12:47:25 -07:00
Thomas Gleixner	7fb2b870d6	x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage Stupid me forgot to change the function name for the CONFIG_X86_IO_APIC=n case in commit `23f9b2671` (x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings()) Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-24 11:14:14 +02:00
Huang Ying	77db5cbd29	KVM: MCE: Send SRAR SIGBUS directly Originally, SRAR SIGBUS is sent to QEMU-KVM via touching the poisoned page. But commit `9605456919` prevents the signal from being sent. So now the signal is sent via force_sig_info_fault directly. [marcelo: use send_sig_info instead] Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:15 +02:00
Huang Ying	5854dbca9b	KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED Now we have MCG_SER_P (and corresponding SRAO/SRAR MCE) support in kernel and QEMU-KVM, the MCG_SER_P should be added into KVM_MCE_CAP_SUPPORTED to make all these code really works. Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:15 +02:00
Nicolas Kaiser	9611c18777	KVM: fix typo in copyright notice Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:14 +02:00
Avi Kivity	395c6b0a9d	KVM: Disable interrupts around get_kernel_ns() get_kernel_ns() wants preemption disabled. It doesn't make a lot of sense during the get/set ioctls (no way to make them non-racy) but the callee wants it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Avi Kivity	7ebaf15eef	KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	3377078027	KVM: MMU: move access code parsing to FNAME(walk_addr) function Move access code parsing from caller site to FNAME(walk_addr) function Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	6903074c36	KVM: MMU: audit: check whether have unsync sps after root sync After root synced, all unsync sps are synced, this patch add a check to make sure it's no unsync sps in VCPU's page table Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	38904e1287	KVM: MMU: audit: introduce audit_printk to cleanup audit code Introduce audit_printk, and record audit point instead audit name Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:13 +02:00
Xiao Guangrong	c42fffe3a3	KVM: MMU: audit: unregister audit tracepoints before module unloaded fix: Call Trace: [<ffffffffa01e46ba>] ? kvm_mmu_pte_write+0x229/0x911 [kvm] [<ffffffffa01c6ba9>] ? gfn_to_memslot+0x39/0xa0 [kvm] [<ffffffffa01c6c26>] ? mark_page_dirty+0x16/0x2e [kvm] [<ffffffffa01c6d6f>] ? kvm_write_guest_page+0x67/0x7f [kvm] [<ffffffff81066fbd>] ? local_clock+0x2a/0x3b [<ffffffffa01d52ce>] emulator_write_phys+0x46/0x54 [kvm] ...... Code: Bad RIP value. RIP [<ffffffffa0172056>] 0xffffffffa0172056 RSP <ffff880134f69a70> CR2: ffffffffa0172056 Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:13 +02:00
Xiao Guangrong	98224bf1d1	KVM: MMU: audit: fix vcpu's spte walking After nested nested paging, it may using long mode to shadow 32/PAE paging guest, so this patch fix it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:12 +02:00
Xiao Guangrong	33f91edb92	KVM: MMU: set access bit for direct mapping Set access bit while setup up direct page table if it's nonpaing or npt enabled, it's good for CPU's speculate access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:11 +02:00
Xiao Guangrong	20bd40dc64	KVM: MMU: cleanup for error mask set while walk guest page table Small cleanup for set page fault error code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:10 +02:00
Xiao Guangrong	6292757fb0	KVM: MMU: update 'root_hpa' out of loop in PAE shadow path The value of 'vcpu->arch.mmu.pae_root' is not modified, so we can update 'root_hpa' out of the loop. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:09 +02:00
Sheng Yang	7129eecac1	KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() Eliminate: arch/x86/kvm/emulate.c:801: warning: ‘sv’ may be used uninitialized in this function on gcc 4.1.2 Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:09 +02:00
Jan Kiszka	50933623e5	KVM: x86: Fix constant type in kvm_get_time_scale Older gcc versions complain about the improper type (for x86-32), 4.5 seems to fix this silently. However, we should better use the right type initially. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:08 +02:00
Jan Kiszka	07d6f555d5	KVM: VMX: Add AX to list of registers clobbered by guest switch By chance this caused no harm so far. We overwrite AX during switch to/from guest context, so we must declare this. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:07 +02:00
Arjan Koers	19b6a85b78	KVM guest: Move a printk that's using the clock before it's ready Fix a hang during SMP kernel boot on KVM that showed up after commit `489fb490db` (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0 (2.6.34.1). The problem only occurs when CONFIG_PRINTK_TIME is set. KVM-Stable-Tag. Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:06 +02:00
Zachary Amsden	c285545f81	KVM: x86: TSC catchup mode Negate the effects of AN TYM spell while kvm thread is preempted by tracking conversion factor to the highest TSC rate and catching the TSC up when it has fallen behind the kernel view of time. Note that once triggered, we don't turn off catchup mode. A slightly more clever version of this is possible, which only does catchup when TSC rate drops, and which specifically targets only CPUs with broken TSC, but since these all are considered unstable_tsc(), this patch covers all necessary cases. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:05 +02:00
Zachary Amsden	34c238a1d1	KVM: x86: Rename timer function This just changes some names to better reflect the usage they will be given. Separated out to keep confusion to a minimum. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:05 +02:00
Zachary Amsden	5f4e3f8827	KVM: x86: Make math work for other scales The math in kvm_get_time_scale relies on the fact that NSEC_PER_SEC < 2^32. To use the same function to compute arbitrary time scales, we must extend the first reduction step to shrink the base rate to a 32-bit value, and possibly reduce the scaled rate into a 32-bit as well. Note we must take care to avoid an arithmetic overflow when scaling up the tps32 value (this could not happen with the fixed scaled value of NSEC_PER_SEC, but can happen with scaled rates above 2^31. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:04 +02:00
Avi Kivity	49e9d557f9	KVM: VMX: Respect interrupt window in big real mode If an interrupt is pending, we need to stop emulation so we can inject it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:02 +02:00
Mohammed Gamal	a92601bb70	KVM: VMX: Emulated real mode interrupt injection Replace the inject-as-software-interrupt hack we currently have with emulated injection. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:01 +02:00
Mohammed Gamal	63995653ad	KVM: Add kvm_inject_realmode_interrupt() wrapper This adds a wrapper function kvm_inject_realmode_interrupt() around the emulator function emulate_int_real() to allow real mode interrupt injection. [avi: initialize operand and address sizes before emulating interrupts] [avi: initialize rip for real mode interrupt injection] [avi: clear interrupt pending flag after emulating interrupt injection] Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:01 +02:00
Mohammed Gamal	4ab8e02404	KVM: x86 emulator: Expose emulate_int_real() Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:00 +02:00
Hillf Danton	cb16a7b387	KVM: MMU: fix counting of rmap entries in rmap_add() It seems that rmap entries are under counted. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:59 +02:00
Gleb Natapov	a0a07cd2c5	KVM: SVM: do not generate "external interrupt exit" if other exit is pending Nested SVM checks for external interrupt after injecting nested exception. In case there is external interrupt pending the code generates "external interrupt exit" and overwrites previous exit info. If previously injected exception already generated exit it will be lost. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Avi Kivity	f4f5105087	KVM: Convert PIC lock from raw spinlock to ordinary spinlock The PIC code used to be called from preempt_disable() context, which wasn't very good for PREEMPT_RT. That is no longer the case, so move back from raw_spinlock_t to spinlock_t. Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Zachary Amsden	28e4639adf	KVM: x86: Fix kvmclock bug If preempted after kvmclock values are updated, but before hardware virtualization is entered, the last tsc time as read by the guest is never set. It underflows the next time kvmclock is updated if there has not yet been a successful entry / exit into hardware virt. Fix this by simply setting last_tsc to the newly read tsc value so that any computed nsec advance of kvmclock is nulled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Joerg Roedel	0959ffacf3	KVM: MMU: Don't track nested fault info in error-code This patch moves the detection whether a page-fault was nested or not out of the error code and moves it into a separate variable in the fault struct. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:55 +02:00
Avi Kivity	625831a3f4	KVM: VMX: Move fixup_rmode_irq() to avoid forward declaration No code changes. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	b463a6f744	KVM: Non-atomic interrupt injection Change the interrupt injection code to work from preemptible, interrupts enabled context. This works by adding a ->cancel_injection() operation that undoes an injection in case we were not able to actually enter the guest (this condition could never happen with atomic injection). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	83422e17c1	KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry Currently vmx_complete_interrupts() can decode event information from vmx exit fields into the generic kvm event queues. Make it able to decode the information from the entry fields as well by parametrizing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:52 +02:00
Avi Kivity	537b37e267	KVM: VMX: Move real-mode interrupt injection fixup to vmx_complete_interrupts() This allows reuse of vmx_complete_interrupts() for cancelling injections. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:51 +02:00
Avi Kivity	51aa01d13d	KVM: VMX: Split up vmx_complete_interrupts() vmx_complete_interrupts() does too much, split it up: - vmx_vcpu_run() gets the "cache important vmcs fields" part - a new vmx_complete_atomic_exit() gets the parts that must be done atomically - a new vmx_recover_nmi_blocking() does what its name says - vmx_complete_interrupts() retains the event injection recovery code This helps in reducing the work done in atomic context. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:51 +02:00
Avi Kivity	3842d135ff	KVM: Check for pending events before attempting injection Instead of blindly attempting to inject an event before each guest entry, check for a possible event first in vcpu->requests. Sites that can trigger event injection are modified to set KVM_REQ_EVENT: - interrupt, nmi window opening - ppr updates - i8259 output changes - local apic irr changes - rflags updates - gif flag set - event set on exit This improves non-injecting entry performance, and sets the stage for non-atomic injection. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:50 +02:00
Avi Kivity	b0bc3ee2b5	KVM: MMU: Fix regression with ept memory types merged into non-ept page tables Commit "KVM: MMU: Make tdp_enabled a mmu-context parameter" made real-mode set ->direct_map, and changed the code that merges in the memory type depend on direct_map instead of tdp_enabled. However, in this case what really matters is tdp, not direct_map, since tdp changes the pte format regardless of whether the mapping is direct or not. As a result, real-mode shadow mappings got corrupted with ept memory types. The result was a huge slowdown, likely due to the cache being disabled. Change it back as the simplest fix for the regression (real fix is to move all that to vmx code, and not use tdp_enabled as a synonym for ept). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:49 +02:00
Joerg Roedel	4c62a2dc92	KVM: X86: Report SVM bit to userspace only when supported This patch fixes a bug in KVM where it _always_ reports the support of the SVM feature to userspace. But KVM only supports SVM on AMD hardware and only when it is enabled in the kernel module. This patch fixes the wrong reporting. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:48 +02:00
Joerg Roedel	3d4aeaad8b	KVM: SVM: Report Nested Paging support to userspace This patch implements the reporting of the nested paging feature support to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:47 +02:00
Joerg Roedel	55c5e464fc	KVM: SVM: Expect two more candiates for exit_int_info This patch adds INTR and NMI intercepts to the list of expected intercepts with an exit_int_info set. While this can't happen on bare metal it is architectural legal and may happen with KVMs SVM emulation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:46 +02:00
Joerg Roedel	4b16184c1c	KVM: SVM: Initialize Nested Nested MMU context on VMRUN This patch adds code to initialize the Nested Nested Paging MMU context when the L1 guest executes a VMRUN instruction and has nested paging enabled in its VMCB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:46 +02:00
Joerg Roedel	5bd2edc341	KVM: SVM: Implement MMU helper functions for Nested Nested Paging This patch adds the helper functions which will be used in the mmu context for handling nested nested page faults. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:45 +02:00
Joerg Roedel	2d48a985c7	KVM: MMU: Track NX state in struct kvm_mmu With Nested Paging emulation the NX state between the two MMU contexts may differ. To make sure that always the right fault error code is recorded this patch moves the NX state into struct kvm_mmu so that the code can distinguish between L1 and L2 NX state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:44 +02:00
Joerg Roedel	81407ca553	KVM: MMU: Allow long mode shadows for legacy page tables Currently the KVM softmmu implementation can not shadow a 32 bit legacy or PAE page table with a long mode page table. This is a required feature for nested paging emulation because the nested page table must alway be in host format. So this patch implements the missing pieces to allow long mode page tables for page table types. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:43 +02:00
Joerg Roedel	651dd37a9c	KVM: MMU: Refactor mmu_alloc_roots function This patch factors out the direct-mapping paths of the mmu_alloc_roots function into a seperate function. This makes it a lot easier to avoid all the unnecessary checks done in the shadow path which may break when running direct. In fact, this patch already fixes a problem when running PAE guests on a PAE shadow page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:42 +02:00
Joerg Roedel	d41d1895eb	KVM: MMU: Introduce kvm_pdptr_read_mmu This function is implemented to load the pdptr pointers of the currently running guest (l1 or l2 guest). Therefore it takes care about the current paging mode and can read pdptrs out of l2 guest physical memory. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:42 +02:00
Joerg Roedel	ff03a073e7	KVM: MMU: Add kvm_mmu parameter to load_pdptrs function This function need to be able to load the pdptrs from any mmu context currently in use. So change this function to take an kvm_mmu parameter to fit these needs. As a side effect this patch also moves the cached pdptrs from vcpu_arch into the kvm_mmu struct. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:41 +02:00
Joerg Roedel	d47f00a62b	KVM: X86: Propagate fetch faults KVM currently ignores fetch faults in the instruction emulator. With nested-npt we could have such faults. This patch adds the code to handle these. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:41 +02:00
Joerg Roedel	d4f8cf664e	KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa This patch implements logic to make sure that either a page-fault/page-fault-vmexit or a nested-page-fault-vmexit is propagated back to the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:40 +02:00
Joerg Roedel	02f59dc9f1	KVM: MMU: Introduce init_kvm_nested_mmu() This patch introduces the init_kvm_nested_mmu() function which is used to re-initialize the nested mmu when the l2 guest changes its paging mode. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:39 +02:00
Joerg Roedel	3d06b8bfd4	KVM: MMU: Introduce kvm_read_nested_guest_page() This patch introduces the kvm_read_guest_page_x86 function which reads from the physical memory of the guest. If the guest is running in guest-mode itself with nested paging enabled it will read from the guest's guest physical memory instead. The patch also changes changes the code to use this function where it is necessary. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:38 +02:00
Joerg Roedel	2329d46d21	KVM: MMU: Make walk_addr_generic capable for two-level walking This patch uses kvm_read_guest_page_tdp to make the walk_addr_generic functions suitable for two-level page table walking. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:38 +02:00
Joerg Roedel	ec92fe44e7	KVM: X86: Add kvm_read_guest_page_mmu function This patch adds a function which can read from the guests physical memory or from the guest's guest physical memory. This will be used in the two-dimensional page table walker. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:37 +02:00
Joerg Roedel	6539e738f6	KVM: MMU: Implement nested gva_to_gpa functions This patch adds the functions to do a nested l2_gva to l1_gpa page table walk. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:36 +02:00
Joerg Roedel	14dfe855f9	KVM: X86: Introduce pointer to mmu context used for gva_to_gpa This patch introduces the walk_mmu pointer which points to the mmu-context currently used for gva_to_gpa translations. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:35 +02:00
Joerg Roedel	c30a358d33	KVM: MMU: Add infrastructure for two-level page walker This patch introduces a mmu-callback to translate gpa addresses in the walk_addr code. This is later used to translate l2_gpa addresses into l1_gpa addresses. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:34 +02:00
Joerg Roedel	1e301feb07	KVM: MMU: Introduce generic walk_addr function This is the first patch in the series towards a generic walk_addr implementation which could walk two-dimensional page tables in the end. In this first step the walk_addr function is renamed into walk_addr_generic which takes a mmu context as an additional parameter. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	8df25a328a	KVM: MMU: Track page fault data in struct vcpu This patch introduces a struct with two new fields in vcpu_arch for x86: * fault.address * fault.error_code This will be used to correctly propagate page faults back into the guest when we could have either an ordinary page fault or a nested page fault. In the case of a nested page fault the fault-address is different from the original address that should be walked. So we need to keep track about the real fault-address. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	3241f22da8	KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu This patch changes is_rsvd_bits_set() function prototype to take only a kvm_mmu context instead of a full vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:32 +02:00
Joerg Roedel	52fde8df7d	KVM: MMU: Introduce kvm_init_shadow_mmu helper function Some logic of the init_kvm_softmmu function is required to build the Nested Nested Paging context. So factor the required logic into a seperate function and export it. Also make the whole init path suitable for more than one mmu context. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:32 +02:00
Joerg Roedel	cb659db8a7	KVM: MMU: Introduce inject_page_fault function pointer This patch introduces an inject_page_fault function pointer into struct kvm_mmu which will be used to inject a page fault. This will be used later when Nested Nested Paging is implemented. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:31 +02:00
Joerg Roedel	5777ed340d	KVM: MMU: Introduce get_cr3 function pointer This function pointer in the MMU context is required to implement Nested Nested Paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:31 +02:00
Joerg Roedel	1c97f0a04c	KVM: X86: Introduce a tdp_set_cr3 function This patch introduces a special set_tdp_cr3 function pointer in kvm_x86_ops which is only used for tpd enabled mmu contexts. This allows to remove some hacks from svm code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:30 +02:00
Joerg Roedel	f43addd461	KVM: MMU: Make set_cr3 a function pointer in kvm_mmu This is necessary to implement Nested Nested Paging. As a side effect this allows some cleanups in the SVM nested paging code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:29 +02:00
Joerg Roedel	c5a78f2b64	KVM: MMU: Make tdp_enabled a mmu-context parameter This patch changes the tdp_enabled flag from its global meaning to the mmu-context and renames it to direct_map there. This is necessary for Nested SVM with emulation of Nested Paging where we need an extra MMU context to shadow the Nested Nested Page Table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:28 +02:00
Joerg Roedel	957446afce	KVM: MMU: Check for root_level instead of long mode The walk_addr function checks for !is_long_mode in its 64 bit version. But what is meant here is a check for pae paging. Change the condition to really check for pae paging so that it also works with nested nested paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:27 +02:00
Jes Sorensen	7b91409822	KVM: x86: Emulate MSR_EBC_FREQUENCY_ID Some operating systems store data about the host processor at the time of installation, and when booted on a more uptodate cpu tries to read MSR_EBC_FREQUENCY_ID. This has been found with XP. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:27 +02:00
Jes Sorensen	b9a52c4b78	x86: Define MSR_EBC_FREQUENCY_ID Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:26 +02:00
Roedel, Joerg	b75f4eb341	KVM: SVM: Clean up rip handling in vmrun emulation This patch changes the rip handling in the vmrun emulation path from using next_rip to the generic kvm register access functions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:25 +02:00
Joerg Roedel	cda0008299	KVM: SVM: Restore correct registers after sel_cr0 intercept emulation This patch implements restoring of the correct rip, rsp, and rax after the svm emulation in KVM injected a selective_cr0 write intercept into the guest hypervisor. The problem was that the vmexit is emulated in the instruction emulation which later commits the registers right after the write-cr0 instruction. So the l1 guest will continue to run with the l2 rip, rsp and rax resulting in unpredictable behavior. This patch is not the final word, it is just an easy patch to fix the issue. The real fix will be done when the instruction emulator is made aware of nested virtualization. Until this is done this patch fixes the issue and provides an easy way to fix this in -stable too. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:24 +02:00
Joerg Roedel	f87f928882	KVM: MMU: Fix 32 bit legacy paging with NPT This patch fixes 32 bit legacy paging with NPT enabled. The mmu_check_root call on the top-level of the loop causes root_gfn to take values (in the tdp_enabled path) which are outside of guest memory. So the mmu_check_root call fails at some point in the loop interation causing the guest to tiple-fault. This patch changes the mmu_check_root calls to the places where they are really necessary. As a side-effect it introduces a check for the root of a pae page table too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:23 +02:00
Xiao Guangrong	30644b902c	KVM: MMU: lower the aduit frequency The audit is very high overhead, so we need lower the frequency to assure the guest is running. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:59 +02:00
Xiao Guangrong	eb2591865a	KVM: MMU: improve spte audit Both audit_mappings() and audit_sptes_have_rmaps() need to walk vcpu's page table, so we can do these checking in a spte walking Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:58 +02:00
Xiao Guangrong	49edf87806	KVM: MMU: improve active sp audit Both audit_rmap() and audit_write_protection() need to walk all active sp, so we can do these checking in a sp walking Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:57 +02:00
Xiao Guangrong	2f4f337248	KVM: MMU: move audit to a separate file Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:57 +02:00
Xiao Guangrong	8b1fe17cc7	KVM: MMU: support disable/enable mmu audit dynamicly Add a r/w module parameter named 'mmu_audit', it can control audit enable/disable: enable: echo 1 > /sys/module/kvm/parameters/mmu_audit disable: echo 0 > /sys/module/kvm/parameters/mmu_audit This patch not change the logic Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:56 +02:00
Jes Sorensen	84e0cefa8d	KVM: Fix guest kernel crash on MSR_K7_CLK_CTL MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant on said old AMD CPU models. This change returns the expected value, which the Linux kernel is expecting to avoid writing back the MSR, plus it ignores all writes to the MSR. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:55 +02:00
Avi Kivity	9ed049c3b6	KVM: i8259: Make ICW1 conform to spec ICW is not a full reset, instead it resets a limited number of registers in the PIC. Change ICW1 emulation to only reset those registers. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:54 +02:00
Avi Kivity	7d9ddaedd8	KVM: x86 emulator: clean up control flow in x86_emulate_insn() x86_emulate_insn() is full of things like if (rc != X86EMUL_CONTINUE) goto done; break; consolidate all of those at the end of the switch statement. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:54 +02:00
Avi Kivity	a4d4a7c188	KVM: x86 emulator: fix group 11 decoding for reg != 0 These are all undefined. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:53 +02:00
Avi Kivity	b9eac5f4d1	KVM: x86 emulator: use single stage decoding for mov instructions Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:52 +02:00
Avi Kivity	e90aa41e6c	KVM: Don't save/restore MSR_IA32_PERF_STATUS It is read/only; restoring it only results in annoying messages. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:51 +02:00
Marcelo Tosatti	eaa48512ba	KVM: SVM: init_vmcb should reset vcpu->efer Otherwise EFER_LMA bit is retained across a SIPI reset. Fixes guest cpu onlining. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:51 +02:00
Marcelo Tosatti	678041ad9d	KVM: SVM: reset mmu context in init_vmcb Since commit `aad827034e` no mmu reinitialization is performed via init_vmcb. Zero vcpu->arch.cr0 and pass the reset value as a parameter to kvm_set_cr0. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:50 +02:00
Avi Kivity	c41a15dd46	KVM: Fix pio trace direction out = write, in = read, not the other way round. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:49 +02:00
Xiao Guangrong	8e0e8afa82	KVM: MMU: remove count_rmaps() Nothing is checked in count_rmaps(), so remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:49 +02:00
Xiao Guangrong	365fb3fdf6	KVM: MMU: rewrite audit_mappings_page() function There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection). This patch fix it by: - introduce gfn_to_pfn_atomic instead of gfn_to_pfn - get the mapping gfn from kvm_mmu_page_get_gfn() And it adds 'notrap' ptes check in unsync/direct sps Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:48 +02:00
Xiao Guangrong	bc32ce2152	KVM: MMU: fix wrong not write protected sp report The audit code reports some sp not write protected in current code, it's just the bug in audit_write_protection(), since: - the invalid sp not need write protected - using uninitialize local variable('gfn') - call kvm_mmu_audit() out of mmu_lock's protection Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:47 +02:00
Xiao Guangrong	0beb8d6604	KVM: MMU: check rmap for every spte The read-only spte also has reverse mapping, so fix the code to check them, also modify the function name to fit its doing Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:46 +02:00
Xiao Guangrong	9ad17b1001	KVM: MMU: fix compile warning in audit code fix: arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’: arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’: arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘set_spte’: arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’: arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’ Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:46 +02:00
Jason Wang	23e7a7944f	KVM: pit: Do not check pending pit timer in vcpu thread Pit interrupt injection was done by workqueue, so no need to check pending pit timer in vcpu thread which could lead unnecessary unblocking of vcpu. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:45 +02:00
Avi Kivity	6230f7fc04	KVM: x86 emulator: simplify ALU opcode block decode further The ALU opcode block is very regular; introduce D6ALU() to define decode flags for 6 instructions at a time. Suggested by Paolo Bonzini. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:43 +02:00
Avi Kivity	217fc9cfca	KVM: Fix build error due to 64-bit division in nsec_to_cycles() Use do_div() instead. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:43 +02:00
Avi Kivity	34d1f4905e	KVM: x86 emulator: trap and propagate #DE from DIV and IDIV Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:42 +02:00
Avi Kivity	f6b3597bde	KVM: x86 emulator: add macros for executing instructions that may trap Like DIV and IDIV. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:41 +02:00
Avi Kivity	739ae40606	KVM: x86 emulator: simplify instruction decode flags for opcodes 0F 00-FF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:41 +02:00
Avi Kivity	d269e3961a	KVM: x86 emulator: simplify instruction decode flags for opcodes E0-FF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:40 +02:00
Avi Kivity	d2c6c7adb1	KVM: x86 emulator: simplify instruction decode flags for opcodes C0-DF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:39 +02:00
Avi Kivity	50748613d1	KVM: x86 emulator: simplify instruction decode flags for opcodes A0-AF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:38 +02:00
Avi Kivity	76e8e68d44	KVM: x86 emulator: simplify instruction decode flags for opcodes 80-8F Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:38 +02:00
Avi Kivity	48fe67b5f7	KVM: x86 emulator: simplify string instruction decode flags Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:37 +02:00
Avi Kivity	5315fbb223	KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:36 +02:00
Avi Kivity	8d8f4e9f66	KVM: x86 emulator: support byte/word opcode pairs Many x86 instructions come in byte and word variants distinguished with bit 0 of the opcode. Add macros to aid in defining them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:35 +02:00
Avi Kivity	081bca0e6b	KVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand SrcMemFAddr is not defined with the modrm operand designating a register instead of a memory address. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:35 +02:00
Gleb Natapov	d2ddd1c483	KVM: x86 emulator: get rid of "restart" in emulation context. x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:34 +02:00
Gleb Natapov	3e2f65d57a	KVM: x86 emulator: move string instruction completion check into separate function Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:33 +02:00
Gleb Natapov	6e2fb2cadd	KVM: x86 emulator: Rename variable that shadows another local variable. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:32 +02:00
Wei Yongjun	cc4feed57f	KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:31 +02:00
Xiao Guangrong	189be38db3	KVM: MMU: combine guest pte read between fetch and pte prefetch Combine guest pte read between guest pte check in the fetch path and pte prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:28 +02:00
Xiao Guangrong	957ed9effd	KVM: MMU: prefetch ptes when intercepted guest #PF Support prefetch ptes when intercept guest #PF, avoid to #PF by later access If we meet any failure in the prefetch path, we will exit it and not try other ptes to avoid become heavy path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:27 +02:00
Zachary Amsden	1d5f066e0b	KVM: x86: Fix a possible backwards warp of kvmclock Kernel time, which advances in discrete steps may progress much slower than TSC. As a result, when kvmclock is adjusted to a new base, the apparent time to the guest, which runs at a much higher, nsec scaled rate based on the current TSC, may have already been observed to have a larger value (kernel_ns + scaled tsc) than the value to which we are setting it (kernel_ns + 0). We must instead compute the clock as potentially observed by the guest for kernel_ns to make sure it does not go backwards. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	347bb4448c	x86: pvclock: Move scale_delta into common header The scale_delta function for shift / multiply with 31-bit precision moves to a common header so it can be used by both kernel and kvm module. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	ca84d1a24c	KVM: x86: Add clock sync request to hardware enable If there are active VCPUs which are marked as belonging to a particular hardware CPU, request a clock sync for them when enabling hardware; the TSC could be desynchronized on a newly arriving CPU, and we need to recompute guests system time relative to boot after a suspend event. This covers both cases. Note that it is acceptable to take the spinlock, as either no other tasks will be running and no locks held (BSP after resume), or other tasks will be guaranteed to drop the lock relatively quickly (AP on CPU_STARTING). Noting we now get clock synchronization requests for VCPUs which are starting up (or restarting), it is tempting to attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers at this time, however it is not correct to do so; they are required for systems with non-constant TSC as the frequency may not be known immediately after the processor has started until the cpufreq driver has had a chance to run and query the chipset. Updated: implement better locking semantics for hardware_enable Removed the hack of dropping and retaking the lock by adding the semantic that we always hold kvm_lock when hardware_enable is called. The one place that doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock won't be taken. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	46543ba45f	KVM: x86: Robust TSC compensation Make the match of TSC find TSC writes that are close to each other instead of perfectly identical; this allows the compensator to also work in migration / suspend scenarios. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	759379dd68	KVM: x86: Add helper functions for time computation Add a helper function to compute the kernel time and convert nanoseconds back to CPU specific cycles. Note that these must not be called in preemptible context, as that would mean the kernel could enter software suspend state, which would cause non-atomic operation. Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel time helper, these should be bootbased as well. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	48434c20e1	KVM: x86: Fix deep C-state TSC desynchronization When CPUs with unstable TSCs enter deep C-state, TSC may stop running. This causes us to require resynchronization. Since we can't tell when this may potentially happen, we assume the worst by forcing re-compensation for it at every point the VCPU task is descheduled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	e48672fa25	KVM: x86: Unify TSC logic Move the TSC control logic from the vendor backends into x86.c by adding adjust_tsc_offset to x86 ops. Now all TSC decisions can be done in one place. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	6755bae8e6	KVM: x86: Warn about unstable TSC If creating an SMP guest with unstable host TSC, issue a warning Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	8cfdc00085	KVM: x86: Make cpu_tsc_khz updates use local CPU This simplifies much of the init code; we can now simply always call tsc_khz_changed, optionally passing it a new value, or letting it figure out the existing value (while interrupts are disabled, and thus, by inference from the rule, not raceful against CPU hotplug or frequency updates, which will issue IPIs to the local CPU to perform this very same task). Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	f38e098ff3	KVM: x86: TSC reset compensation Attempt to synchronize TSCs which are reset to the same value. In the case of a reliable hardware TSC, we can just re-use the same offset, but on non-reliable hardware, we can get closer by adjusting the offset to match the elapsed time. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	99e3e30aee	KVM: x86: Move TSC offset writes to common code Also, ensure that the storing of the offset and the reading of the TSC are never preempted by taking a spinlock. While the lock is overkill now, it is useful later in this patch series. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	f4e1b3c8bd	KVM: x86: Convert TSC writes to TSC offset writes Change svm / vmx to be the same internally and write TSC offset instead of bare TSC in helper functions. Isolated as a single patch to contain code movement. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	ae38436b78	KVM: x86: Drop vm_init_tsc This is used only by the VMX code, and is not done properly; if the TSC is indeed backwards, it is out of sync, and will need proper handling in the logic at each and every CPU change. For now, drop this test during init as misguided. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Wei Yongjun	45bf21a8ce	KVM: MMU: fix missing percpu counter destroy commit ad05c88266b4cce1c820928ce8a0fb7690912ba1 (KVM: create aggregate kvm_total_used_mmu_pages value) introduce percpu counter kvm_total_used_mmu_pages but never destroy it, this may cause oops when rmmod & modprobe. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Xiaotian Feng	80b63faf02	KVM: MMU: fix regression from rework mmu_shrink() code Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/ kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(), This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever. Moving kvm_mmu_commit_zap_page would make the while loop performs as normal. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Tested-by: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Wei Yongjun	e4abac67b7	KVM: x86 emulator: add JrCXZ instruction emulation Add JrCXZ instruction emulation (opcode 0xe3) Used by FreeBSD boot loader. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:20 +02:00
Wei Yongjun	09b5f4d3c4	KVM: x86 emulator: add LDS/LES/LFS/LGS/LSS instruction emulation Add LDS/LES/LFS/LGS/LSS instruction emulation. (opcode 0xc4, 0xc5, 0x0f 0xb2, 0x0f 0xb4~0xb5) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:20 +02:00
Dave Hansen	45221ab668	KVM: create aggregate kvm_total_used_mmu_pages value Of slab shrinkers, the VM code says: * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is * querying the cache size, so a fastpath for that case is appropriate. and it means it. Look at how it calls the shrinkers: nr_before = (shrinker->shrink)(0, gfp_mask); shrink_ret = (shrinker->shrink)(this_scan, gfp_mask); So, if you do anything stupid in your shrinker, the VM will doubly punish you. The mmu_shrink() function takes the global kvm_lock, then acquires every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then we're going to take 101 locks. We do it twice, so each call takes 202 locks. If we're under memory pressure, we can have each cpu trying to do this. It can get really hairy, and we've seen lock spinning in mmu_shrink() be the dominant entry in profiles. This is guaranteed to optimize at least half of those lock aquisitions away. It removes the need to take any of the locks when simply trying to count objects. A 'percpu_counter' can be a large object, but we only have one of these for the entire system. There are not any better alternatives at the moment, especially ones that handle CPU hotplug. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:19 +02:00
Dave Hansen	49d5ca2663	KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages Doing this makes the code much more readable. That's borne out by the fact that this patch removes code. "used" also happens to be the number that we need to return back to the slab code when our shrinker gets called. Keeping this value as opposed to free makes the next patch simpler. So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a structure member (and not a pointer) of 'struct kvm'. That means they start out zeroed. I _think_ they get initialized properly by kvm_mmu_change_mmu_pages(). But, that only happens via kvm ioctls. Another benefit of storing 'used' intead of 'free' is that the values are consistent from the moment the structure is allocated: no negative "used" value. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Dave Hansen	39de71ec53	KVM: rename x86 kvm->arch.n_alloc_mmu_pages arch.n_alloc_mmu_pages is a poor choice of name. This value truly means, "the number of pages which _may_ be allocated". But, reading the name, "n_alloc_mmu_pages" implies "the number of allocated mmu pages", which is dead wrong. It's really the high watermark, so let's give it a name to match: nr_max_mmu_pages. This change will make the next few patches much more obvious and easy to read. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Dave Hansen	e0df7b9f6c	KVM: abstract kvm x86 mmu->n_free_mmu_pages "free" is a poor name for this value. In this context, it means, "the number of mmu pages which this kvm instance should be able to allocate." But "free" implies much more that the objects are there and ready for use. "available" is a much better description, especially when you see how it is calculated. In this patch, we abstract its use into a function. We'll soon replace the function's contents by calculating the value in a different way. All of the reads of n_free_mmu_pages are taken care of in this patch. The modification sites will be handled in a patch later in the series. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:17 +02:00
Avi Kivity	6142914280	KVM: x86 emulator: implement CWD (opcode 99) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:16 +02:00
Avi Kivity	d46164dbd9	KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:16 +02:00
Avi Kivity	7db41eb762	KVM: x86 emulator: add Src2Imm decoding Needed for 3-operand IMUL. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:15 +02:00
Avi Kivity	39f21ee546	KVM: x86 emulator: consolidate immediate decode into a function Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:14 +02:00
Avi Kivity	48bb5d3c40	KVM: x86 emulator: implement RDTSC (opcode 0F 31) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:14 +02:00
Avi Kivity	7077aec0bc	KVM: x86 emulator: remove SrcImplicit Useless. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:13 +02:00
Avi Kivity	5c82aa2998	KVM: x86 emulator: implement IMUL REG, R/M (opcode 0F AF) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	f3a1b9f496	KVM: x86 emulator: implement IMUL REG, R/M, imm8 (opcode 6B) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	40ece7c729	KVM: x86 emulator: implement RET imm16 (opcode C2) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	b250e60589	KVM: x86 emulator: add SrcImmU16 operand type Used for RET NEAR instructions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	0ef753b8c3	KVM: x86 emulator: implement CALL FAR (FF /3) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	7af04fc05c	KVM: x86 emulator: implement DAS (opcode 2F) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	fb2c264105	KVM: x86 emulator: Use a register for ____emulate_2op() destination Most x86 two operand instructions allow the destination to be a memory operand, but IMUL (for example) requires that the destination be a register. Change ____emulate_2op() to take a register for both source and destination so we can invoke IMUL. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	b3b3d25a12	KVM: x86 emulator: pass destination type to ____emulate_2op() We'll need it later so we can use a register for the destination. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Wei Yongjun	f2f3184534	KVM: x86 emulator: add LOOP/LOOPcc instruction emulation Add LOOP/LOOPcc instruction emulation (opcode 0xe0~0xe2). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Wei Yongjun	e8b6fa70e3	KVM: x86 emulator: add CBW/CWDE/CDQE instruction emulation Add CBW/CWDE/CDQE instruction emulation.(opcode 0x98) Used by FreeBSD's boot loader. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	0fa6ccbd28	KVM: x86 emulator: fix REPZ/REPNZ termination condition EFLAGS.ZF needs to be checked after each iteration, not before. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:10 +02:00
Avi Kivity	f6b33fc504	KVM: x86 emulator: implement SCAS (opcodes AE, AF) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:10 +02:00
Avi Kivity	5c56e1cf7a	KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS emulate_push() only schedules a push; it doesn't actually push anything. Call writeback() to flush out the write. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	a13a63faa6	KVM: x86 emulator: remove dup code of in/out instruction Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	41167be544	KVM: x86 emulator: change OUT instruction to use dst instead of src Change OUT instruction to use dst instead of src, so we can reuse those code for all out instructions. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	943858e275	KVM: x86 emulator: introduce DstImmUByte for dst operand decode Introduce DstImmUByte for dst operand decode, which will be used for out instruction. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	c483c02ad3	KVM: x86 emulator: remove useless label from x86_emulate_insn() Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	ee45b58efe	KVM: x86 emulator: add setcc instruction emulation Add setcc instruction emulation (opcode 0x0f 0x90~0x9f) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:08 +02:00
Wei Yongjun	92f738a52b	KVM: x86 emulator: add XADD instruction emulation Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:06 +02:00
Wei Yongjun	31be40b398	KVM: x86 emulator: put register operand write back to a function Introduce function write_register_operand() to write back the register operand. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:06 +02:00
Mohammed Gamal	8ec4722dd2	KVM: Separate emulation context initialization in a separate function The code for initializing the emulation context is duplicated at two locations (emulate_instruction() and kvm_task_switch()). Separate it in a separate function and call it from there. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Wei Yongjun	d9574a25af	KVM: x86 emulator: add bsf/bsr instruction emulation Add bsf/bsr instruction emulation (opcode 0x0f 0xbc~0xbd) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Mohammed Gamal	8c5eee30a9	KVM: x86 emulator: Fix emulate_grp3 return values This patch lets emulate_grp3() return X86EMUL_* return codes instead of hardcoded ones. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Mohammed Gamal	3f9f53b0d5	KVM: x86 emulator: Add unary mul, imul, div, and idiv instructions This adds unary mul, imul, div, and idiv instructions (group 3 r/m 4-7). Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Wei Yongjun	ba7ff2b76d	KVM: x86 emulator: mask group 8 instruction as BitOp Mask group 8 instruction as BitOp, so we can share the code for adjust the source operand. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:03 +02:00
Wei Yongjun	3885f18fe3	KVM: x86 emulator: do not adjust the address for immediate source adjust the dst address for a register source but not adjust the address for an immediate source. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:02 +02:00
Wei Yongjun	35c843c485	KVM: x86 emulator: fix negative bit offset BitOp instruction emulation If bit offset operands is a negative number, BitOp instruction will return wrong value. This patch fix it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:01 +02:00
Mohammed Gamal	8744aa9aad	KVM: x86 emulator: Add stc instruction (opcode 0xf9) Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:01 +02:00
Wei Yongjun	c034da8b92	KVM: x86 emulator: using SrcOne for instruction d0/d1 decoding Using SrcOne for instruction d0/d1 decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Wei Yongjun	36089fed70	KVM: x86 emulator: disable writeback when decode dest operand This patch change to disable writeback when decode dest operand if the dest type is ImplicitOps or not specified. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Wei Yongjun	06cb704611	KVM: x86 emulator: use SrcAcc to simplify stos decoding Use SrcAcc to simplify stos decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Mohammed Gamal	6e154e56b4	KVM: x86 emulator: Add into, int, and int3 instructions (opcodes 0xcc-0xce) This adds support for int instructions to the emulator. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Mohammed Gamal	160ce1f1a8	KVM: x86 emulator: Allow accessing IDT via emulator ops The patch adds a new member get_idt() to x86_emulate_ops. It also adds a function to get the idt in order to be used by the emulator. This is needed for real mode interrupt injection and the emulation of int instructions. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:59 +02:00
Wei Yongjun	d3ad624329	KVM: x86 emulator: simplify two-byte opcode check Two-byte opcode always start with 0x0F and the decode flags of opcode 0xF0 is always 0, so remove dup check. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:59 +02:00
Alexander Graf	ba49296236	KVM: Move kvm_guest_init out of generic code Currently x86 is the only architecture that uses kvm_guest_init(). With PowerPC we're getting a second user, but the signature is different there and we don't need to export it, as it uses the normal kernel init framework. So let's move the x86 specific definition of that function over to the x86 specfic header file. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:49 +02:00
Mohammed Gamal	34698d8c61	KVM: x86 emulator: Fix nop emulation If a nop instruction is encountered, we jump directly to the done label. This skip updating rip. Break from the switch case instead Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:41 +02:00
Avi Kivity	2dbd0dd711	KVM: x86 emulator: Decode memory operands directly into a 'struct operand' Since modrm operand can be either register or memory, decoding it into a 'struct operand', which can represent both, is simpler. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:40 +02:00
Avi Kivity	1f6f05800e	KVM: x86 emulator: change invlpg emulation to use src.mem.addr Instead of using modrm_ea, which will soon be gone. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:39 +02:00
Avi Kivity	342fc63095	KVM: x86 emulator: switch LEA to use SrcMem decoding The NoAccess flag will prevent memory from being accessed. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:38 +02:00
Avi Kivity	5a506b125f	KVM: x86 emulator: add NoAccess flag for memory instructions that skip access Use for INVLPG, which accesses the tlb, not memory. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:37 +02:00
Avi Kivity	b27f38563d	KVM: x86 emulator: use struct operand for mov reg,dr and mov dr,reg for reg op This is an ordinary modrm source or destination; use the standard structure representing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:36 +02:00
Avi Kivity	1a0c7d44e4	KVM: x86 emulator: use struct operand for mov reg,cr and mov cr,reg for reg op This is an ordinary modrm source or destination; use the standard structure representing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	cecc9e3916	KVM: x86 emulator: mark mov cr and mov dr as 64-bit instructions in long mode Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	7f9b4b75be	KVM: x86 emulator: introduce Op3264 for mov cr and mov dr instructions The operands for these instructions are 32 bits or 64 bits, depending on long mode, and ignoring REX prefixes, or the operand size prefix. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	1e87e3efe7	KVM: x86 emulator: simplify REX.W check (x && (x & y)) == (x & y) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:34 +02:00
Avi Kivity	d4709c78ee	KVM: x86 emulator: drop use_modrm_ea Unused (and has never been). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:34 +02:00
Avi Kivity	91ff3cb43c	KVM: x86 emulator: put register operand fetch into a function The code is repeated three times, put it into fetch_register_operand() Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	3d9e77dff8	KVM: x86 emulator: use SrcAcc to simplify xchg decoding Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	4515453964	KVM: x86 emulator: simplify xchg decode tables Use X8() to avoid repetition. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	1a6440aef6	KVM: x86 emulator: use correct type for memory address in operands Currently we use a void pointer for memory addresses. That's wrong since these are guest virtual addresses which are not directly dereferencable by the host. Use the correct type, unsigned long. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	09ee57cdae	KVM: x86 emulator: push segment override out of decode_modrm() Let it compute modrm_seg instead, and have the caller apply it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Joerg Roedel	dbe7758482	KVM: SVM: Check for asid != 0 on nested vmrun This patch lets a nested vmrun fail if the L1 hypervisor left the asid zero. This fixes the asid_zero unit test. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Joerg Roedel	52c65a30a5	KVM: SVM: Check for nested vmrun intercept before emulating vmrun This patch lets the nested vmrun fail if the L1 hypervisor has not intercepted vmrun. This fixes the "vmrun intercept check" unit test. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Xiao Guangrong	4132779b17	KVM: MMU: mark page dirty only when page is really written Mark page dirty only when this page is really written, it's more exacter, and also can fix dirty page marking in speculation path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Xiao Guangrong	8672b7217a	KVM: MMU: move bits lost judgement into a separate function Introduce spte_has_volatile_bits() function to judge whether spte bits will miss, it's more readable and can help us to cleanup code later Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:31 +02:00
Xiao Guangrong	251464c464	KVM: MMU: using kvm_set_pfn_accessed() instead of mark_page_accessed() It's a small cleanup that using using kvm_set_pfn_accessed() instead of mark_page_accessed() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:30 +02:00
Gleb Natapov	4fc40f076f	KVM: x86 emulator: check io permissions only once for string pio Do not recheck io permission on every iteration. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:29 +02:00
Avi Kivity	9928ff608b	KVM: x86 emulator: fix LMSW able to clear cr0.pe LMSW is documented not to be able to clear cr0.pe; make it so. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:28 +02:00
Gleb Natapov	e85d28f8e8	KVM: x86 emulator: don't update vcpu state if instruction is restarted No need to update vcpu state since instruction is in the middle of the emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:27 +02:00
Avi Kivity	63540382cc	KVM: x86 emulator: convert some push instructions to direct decode Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:26 +02:00
Avi Kivity	d0e533255d	KVM: x86 emulator: allow repeat macro arguments to contain commas Needed for repeating instructions with execution functions. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:25 +02:00
Avi Kivity	73fba5f4fe	KVM: x86 emulator: move decode tables downwards So they can reference execution functions. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:25 +02:00
Avi Kivity	dde7e6d12a	KVM: x86 emulator: move x86_decode_insn() downwards No code changes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:24 +02:00
Avi Kivity	ef65c88912	KVM: x86 emulator: allow storing emulator execution function in decode tables Instead of looking up the opcode twice (once for decode flags, once for the big execution switch) look up both flags and function in the decode tables. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:22 +02:00
Avi Kivity	9aabc88fc8	KVM: x86 emulator: store x86_emulate_ops in emulation context It doesn't ever change, so we don't need to pass it around everywhere. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:21 +02:00
Avi Kivity	ab85b12b1a	KVM: x86 emulator: move ByteOp and Dst back to bits 0:3 Now that the group index no longer exists, the space is free. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:20 +02:00
Avi Kivity	3885d530b0	KVM: x86 emulator: drop support for old-style groups Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:19 +02:00
Avi Kivity	9f5d3220e3	KVM: x86 emulator: convert group 9 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:18 +02:00
Avi Kivity	2cb20bc8af	KVM: x86 emulator: convert group 8 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:18 +02:00
Avi Kivity	2f3a9bc9eb	KVM: x86 emulator: convert group 7 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:16 +02:00
Avi Kivity	b67f9f0741	KVM: x86 emulator: convert group 5 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:15 +02:00
Avi Kivity	591c9d20a3	KVM: x86 emulator: convert group 4 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:14 +02:00
Avi Kivity	ee70ea30ee	KVM: x86 emulator: convert group 3 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:13 +02:00
Avi Kivity	99880c5cd5	KVM: x86 emulator: convert group 1A to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:12 +02:00
Avi Kivity	5b92b5faff	KVM: x86 emulator: convert group 1 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:11 +02:00
Avi Kivity	120df8902d	KVM: x86 emulator: allow specifying group directly in opcode Instead of having a group number, store the group table pointer directly in the opcode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:10 +02:00
Avi Kivity	793d5a8d6b	KVM: x86 emulator: reserve group code 0 We'll be using that to distinguish between new-style and old-style groups. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:09 +02:00
Avi Kivity	42a1c52095	KVM: x86 emulator: move group tables to top No code changes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:08 +02:00
Avi Kivity	fd853310a1	KVM: x86 emulator: Add wrappers for easily defining opcodes Once 'struct opcode' grows, its initializer will become more complicated. Wrap the simple initializers in a D() macro, and replace the empty initializers with an even simpler N macro. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:08 +02:00
Avi Kivity	d65b1dee40	KVM: x86 emulator: introduce 'struct opcode' This will hold all the information known about the opcode. Currently, this is just the decode flags. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:07 +02:00
Avi Kivity	ea9ef04e19	KVM: x86 emulator: drop parentheses in repreat macros The parenthese make is impossible to use the macros with initializers that require braces. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:06 +02:00
Mohammed Gamal	62bd430e6d	KVM: x86 emulator: Add IRET instruction Ths patch adds IRET instruction (opcode 0xcf). Currently, only IRET in real mode is emulated. Protected mode support is to be added later if needed. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Reviewed-by: Avi Kivity <avi@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:05 +02:00
Joerg Roedel	7a190667bb	KVM: SVM: Emulate next_rip svm feature This patch implements the emulations of the svm next_rip feature in the nested svm implementation in kvm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:04 +02:00
Joerg Roedel	3f6a9d1693	KVM: SVM: Sync efer back into nested vmcb This patch fixes a bug in a nested hypervisor that heavily switches between real-mode and long-mode. The problem is fixed by syncing back efer into the guest vmcb on emulated vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:03 +02:00
Xiao Guangrong	19ada5c4b6	KVM: MMU: remove valueless output message After commit 53383eaad08d, the '*spte' has updated before call rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so remove this information from error message Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:02 +02:00
Avi Kivity	d359192fea	KVM: VMX: Use host_gdt variable wherever we need the host gdt Now that we have the host gdt conveniently stored in a variable, make use of it instead of querying the cpu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:01 +02:00
Avi Kivity	e071edd5ba	KVM: x86 emulator: unify the two Group 3 variants Use just one group table for byte (F6) and word (F7) opcodes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:00 +02:00
Avi Kivity	dfe11481d8	KVM: x86 emulator: Allow LOCK prefix for NEG and NOT Opcodes F6/2, F6/3, F7/2, F7/3. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:59 +02:00
Avi Kivity	4968ec4e26	KVM: x86 emulator: simplify Group 1 decoding Move operand decoding to the opcode table, keep lock decoding in the group table. This allows us to get consolidate the four variants of Group 1 into one group. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:58 +02:00
Avi Kivity	52811d7de5	KVM: x86 emulator: mix decode bits from opcode and group decode tables Allow bits that are common to all members of a group to be specified in the opcode table instead of the group table. This allows some simplification of the decode tables. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:58 +02:00
Avi Kivity	047a481809	KVM: x86 emulator: add Undefined decode flag Add a decode flag to indicate the instruction is invalid. Will come in useful later, when we mix decode bits from the opcode and group table. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:57 +02:00
Avi Kivity	2ce495365f	KVM: x86 emulator: Make group storage bits separate from operand bits Currently group bits are stored in bits 0:7, where operand bits are stored. Make group bits be 0:3, and move the existing bits 0:3 to 16:19, so we can mix group and operand bits. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:55 +02:00
Avi Kivity	880a188378	KVM: x86 emulator: consolidate Jcc rel32 decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:55 +02:00
Avi Kivity	be8eacddbd	KVM: x86 emulator: consolidate CMOVcc decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:53 +02:00
Avi Kivity	b6e6153885	KVM: x86 emulator: consolidate MOV reg, imm decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:52 +02:00
Avi Kivity	b3ab3405fe	KVM: x86 emulator: consolidate Jcc rel8 decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:51 +02:00
Avi Kivity	3849186c38	KVM: x86 emulator: consolidate push/pop reg decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:49 +02:00
Avi Kivity	749358a6b4	KVM: x86 emulator: consolidate inc/dec reg decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:48 +02:00
Avi Kivity	83babbca46	KVM: x86 emulator: add macros for repetitive instructions Some instructions are repetitive in the opcode space, add macros for consolidating them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:48 +02:00
Avi Kivity	91269b8f94	KVM: x86 emulator: fix handling for unemulated instructions If an instruction is present in the decode tables but not in the execution switch, it will be emulated as a NOP. An example is IRET (0xcf). Fix by adding default: labels to the execution switches. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:47 +02:00
Jiri Slaby	e4072a9a9d	x86, printk: Get rid of <0> from stack output The stack output currently looks like this: 7fffffffffffffff 0000000a00000000 ffffffff81093341 0000000000000046 <0> ffff88003a545fd8 0000000000000000 0000000000000000 00007fffa39769c0 <0> ffff88003e403f58 ffffffff8102fc4c ffff88003e403f58 ffff88003e403f78 The superfluous <0> are caused by recent printk KERN_CONT change. <*> is now ignored in printk unless some text follows the level and even then it still has to be the first in the format message. Note that the log_lvl parameter is now completely ignored in show_stack_log_lvl and the stack is dumped with the default level (like for quite some time already). It behaves the same as the rest of the dump, function traces are dumped in the very same manner. Only Code and maybe some lines are printed with EMERG level. Unfortunately I see no way how to fix this conceptually to have the whole oops/BUG/panic output with the same level, so this removed only the superfluous characters for the time being. Just for illustration: <4>Process kworker/0:0 (pid: 0, threadinfo ffff88003c8a6000, task ffff88003c85c100) <0>Stack: <4> ffffffff818022c0 0000000a00000001 0000000000000001 0000000000000046 <4> ffff88003c8a7fd8 0000000000000001 ffff88003c8a7e58 0000000000000000 <4> ffff88003e503f48 ffffffff8102fc4c ffff88003e503f48 ffff88003e503f68 <0>Call Trace: <0> <IRQ> <4> [<ffffffff8102fc4c>] ? call_softirq+0x1c/0x30 ... <0>Code: 00 01 00 00 65 8b 04 25 80 c5 00 00 c7 45 ... Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: jirislaby@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1287586131-16222-1-git-send-email-jslaby@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-23 20:03:03 +02:00
Thomas Gleixner	23f9b26715	x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings() probe_br_irqs_gsi() is called right after ioapic_init_mappings() and there are no other users. Move it into ioapic_init_mappings() so the declaration can disappear and the function can become static. Rename ioapic_init_mappings() to ioapic_and_gsi_init() to reflect that change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1287510389-8388-2-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>	2010-10-23 17:27:50 +02:00
Thomas Gleixner	5a7ae78fd4	x86: Allow platforms to force enable apic Some embedded x86 platforms don't setup the APIC in the BIOS/bootloader and would be forced to add "lapic" on the kernel command line. That's a bit akward. Split out the force enable code from detect_init_APIC() and allow platform code to call it from the platform setup. That avoids the command line parameter and possible replication of the MSR dance in the force enable code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1287510389-8388-1-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>	2010-10-23 17:27:43 +02:00
Linus Torvalds	02f36038c5	Merge branches 'softirq-for-linus', 'x86-debug-for-linus', 'x86-numa-for-linus', 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirqs: Make wakeup_softirqd static * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, asm: Restore parentheses around one pushl_cfi argument x86, asm: Fix ancient-GAS workaround x86, asm: Fix CFI macro invocations to deal with shortcomings in gas * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA * 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: HPET force enable for CX700 / VIA Epia LT * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, setup: Use string copy operation to optimze copy in kernel compression * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Use allocated buffer in tlb_uv.c:tunables_read() * 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.	2010-10-23 08:25:36 -07:00
Linus Torvalds	10f2a2b0f6	Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, mm: Add an initial page table for core bootstrapping	2010-10-22 20:37:50 -07:00
Linus Torvalds	8814011679	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kdb,debug_core: adjust master cpu switch logic against new debug_core locking debug_core: refactor locking for master/slave cpus x86,kgdb: remove unnecessary call to kgdb_correct_hw_break() debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter() kdb,kgdb: fix sparse fixups kdb: Fix oops in kdb_unregister kdb,ftdump: Remove reference to internal kdb include kdb: Allow kernel loadable modules to add kdb shell functions debug_core: stop rcu warnings on kernel resume debug_core: move all watch dog syncs to a single function x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35	2010-10-22 20:35:12 -07:00
Linus Torvalds	0fc0531e0a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: update comments to reflect that percpu allocations are always zero-filled percpu: Optimize __get_cpu_var() x86, percpu: Optimize this_cpu_ptr percpu: clear memory allocated with the km allocator percpu: fix build breakage on s390 and cleanup build configuration tests percpu: use percpu allocator on UP too percpu: reduce PCPU_MIN_UNIT_SIZE to 32k vmalloc: pcpu_get/free_vm_areas() aren't needed on UP Fixed up trivial conflicts in include/linux/percpu.h	2010-10-22 17:31:36 -07:00
H. Peter Anvin	fd35fbcdd1	x86-64, asm: Use fxsaveq/fxrestorq in more places Checkin `d7acb92fea` made use of fxsaveq in fpu_fxsave() if the assembler supports it; this adds fxsaveq/fxrstorq to fxrstor_checking() and fxsave_user() as well. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <AANLkTi=RKyHLNTq6iomZOXkc6Zw1j9iAgsq8388XmzwN@mail.gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-22 15:33:38 -07:00
Dongdong Deng	39a0715f5a	x86,kgdb: remove unnecessary call to kgdb_correct_hw_break() The kernel debug_core invokes hw breakpoint install and removal via call backs. The architecture specific kgdb stubs only need to implement the call backs and not actually call the functions. Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> CC: x86@kernel.org CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com>	2010-10-22 15:34:13 -05:00
Jason Wessel	91b152aa85	kdb,kgdb: fix sparse fixups Fix the following sparse warnings: kdb_main.c:328:5: warning: symbol 'kdbgetu64arg' was not declared. Should it be static? kgdboc.c:246:12: warning: symbol 'kgdboc_early_init' was not declared. Should it be static? kgdb.c:652:26: warning: incorrect type in argument 1 (different address spaces) kgdb.c:652:26: expected void const ptr kgdb.c:652:26: got struct perf_event [noderef] <asn:3>pev The one in kgdb.c required the (void __force) because of the return code from register_wide_hw_breakpoint looking like: return (void __percpu __force *)ERR_PTR(err); Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-10-22 15:34:12 -05:00
Jason Wessel	fad99fac26	x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35 HW breakpoints events stopped working correctly with kgdb as a result of commit: `018cbffe68` (Merge commit 'v2.6.33' into perf/core), later commit: `ba773f7c51` (x86,kgdb: Fix hw breakpoint regression) allowed breakpoints to propagate to the debugger core but did not completely address the original regression in functionality found in 2.6.35. When the DR_STEP flag is set in dr6 along with any of the DR_TRAP bits, the kgdb exception handler will enter once from the hw_breakpoint API call back and again from the die notifier for do_debug(), which causes the debugger to stop twice and also for the kgdb regression tests to fail running under kvm with: echo V2I1 > /sys/module/kgdbts/parameters/kgdbts To address the problem, the kgdb overflow handler needs to implement the same logic as the ptrace overflow handler call back with respect to updating the virtual copy of dr6. This will allow the kgdb do_debug() die notifier to properly handle the exception and the attached debugger, or kgdb test suite, will only receive a single notification. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: x86@kernel.org	2010-10-22 15:34:10 -05:00
Stefano Stabellini	0e058e5277	xen: add a missing #include to arch/x86/pci/xen.c Add missing #include <asm/io_apic.h> to arch/x86/pci/xen.c. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-10-22 21:26:02 +01:00
Stefano Stabellini	ff12849a7a	xen: mask the MTRR feature from the cpuid We don't want Linux to think that the cpu supports MTRRs when running under Xen because MTRR operations could only be performed through hypercalls. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:26:02 +01:00
Juan Quintela	4ec5387cc3	xen: add the direct mapping area for ISA bus access add the direct mapping area for ISA bus access when running as initial domain Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:47 +01:00
Stefano Stabellini	801fd14a72	xen: use vcpu_ops to setup cpu masks Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:45 +01:00
Jeremy Fitzhardinge	98511f3532	xen: map a dummy page for local apic and ioapic in xen_set_fixmap Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:44 +01:00
Qing He	f731e3ef02	xen: remap MSIs into pirqs when running as initial domain Implement xen_create_msi_irq to create an msi and remap it as pirq. Use xen_create_msi_irq to implement an initial domain specific version of setup_msi_irqs. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:44 +01:00
Jeremy Fitzhardinge	38aa66fcb7	xen: remap GSIs as pirqs when running as initial domain Implement xen_register_gsi to setup the correct triggering and polarity properties of a gsi. Implement xen_register_pirq to register a particular gsi as pirq and receive interrupts as events. Call xen_setup_pirqs to register all the legacy ISA irqs as pirqs. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:43 +01:00
Stefano Stabellini	6b0661a5e6	xen: introduce XEN_DOM0 as a silent option Add XEN_DOM0 to arch/x86/xen/Kconfig as a silent compile time option that gets enabled when xen and basic x86, acpi and pci support are selected. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:43 +01:00
Stefano Stabellini	809f9267bb	xen: map MSIs into pirqs Map MSIs into pirqs, writing 0 in the MSI vector data field and the pirq number in the MSI destination id field. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:43 +01:00
Stefano Stabellini	3942b740e5	xen: support GSI -> pirq remapping in PV on HVM guests Disable pcifront when running on HVM: it is meant to be used with pv guests that don't have PCI bus. Use acpi_register_gsi_xen_hvm to remap GSIs into pirqs. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge	90f6881e64	xen: add xen hvm acpi_register_gsi variant Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge	2f065aef17	acpi: use indirect call to register gsi in different modes Rather than using a tree of conditionals, use function pointer for acpi_register_gsi. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-10-22 21:25:41 +01:00
Stefano Stabellini	42a1de56f3	xen: implement xen_hvm_register_pirq xen_hvm_register_pirq allows the kernel to map a GSI into a Xen pirq and receive the interrupt as an event channel from that point on. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:41 +01:00
Stefano Stabellini	67ba37293e	Merge commit 'konrad/stable/xen-pcifront-0.8.2' into 2.6.36-rc8-initial-domain-v6	2010-10-22 21:24:06 +01:00
Ian Campbell	9e9a5fcb04	xen: use host E820 map for dom0 When running as initial domain, get the real physical memory map from xen using the XENMEM_machine_memory_map hypercall and use it to setup the e820 regions. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 13:19:19 -07:00
Ian Campbell	375b2a9ada	xen: correctly rebuild mfn list list after migration. Otherwise the second migration attempt fails because the mfn_list_list still refers to all the old mfns. We need to update the entires in both p2m_top_mfn and the mid_mfn pages which p2m_top_mfn refers to. In order to do this we need to keep track of the virtual addresses mapping the p2m_mid_mfn pages since we cannot rely on mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still contain the old MFN after a migration, which may now belong to another domain and hence have a different mapping in the m2p. Therefore add and maintain a third top level page, p2m_top_mfn_p[], which tracks the virtual addresses of the mfns contained in p2m_top_mfn[]. We also need to update the content of the p2m_mid_missing_mfn page on resume to refer to the page's new mfn. p2m_missing does not need updating since the migration process takes care of the leaf p2m pages for us. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:36 -07:00
Jeremy Fitzhardinge	3654581e47	xen: don't add extra_pages for RAM after mem_end If an E820 region is entirely beyond mem_end, don't attempt to truncate it and add the truncated pages to extra_pages, as they will be negative. Also, make sure the extra memory region starts after all BIOS provided E820 regions (and in the case of RAM regions, post-clipping). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:32 -07:00
Jeremy Fitzhardinge	41f2e4771a	xen: add support for PAT Convert Linux PAT entries into Xen ones when constructing ptes. Linux doesn't use _PAGE_PAT for ptes, so the only difference in the first 4 entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default) use it for WT. xen_pte_val does the inverse conversion. We hard-code assumptions about Linux's current PAT layout, but a warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems. If necessary we could go to a more general table-based conversion between Linux and Xen PAT entries. hugetlbfs poses a problem at the moment, the x86 architecture uses the same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending on which pagetable level we're using. At the moment this should be OK so long as nobody tries to do a pte_val on a hugetlbfs pte. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:31 -07:00
Jeremy Fitzhardinge	2f7acb2085	xen: make sure xen_max_p2m_pfn is up to date Keep xen_max_p2m_pfn up to date with the end of the extra memory we're adding. It is possible that it will be too high since memory may be truncated by a "mem=" option on the kernel command line, but that won't matter. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:30 -07:00
Jeremy Fitzhardinge	698bb8d14a	xen: limit extra memory to a certain ratio of base If extra memory is very much larger than the base memory size then all of the base memory can be filled with structures reserved to describe the extra memory, leaving no space for anything else. Even at the maximum ratio there will be little space for anything else, but this change is intended to at least allow the system to boot rather than crash mysteriously. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge	b5b43ced7a	xen: add extra pages for E820 RAM regions, even if beyond mem_end If an entire E820 RAM region is beyond mem_end, still add its pages to the extra area so that space can be used by the kernel. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge	36bc251b87	xen: make sure xen_extra_mem_start is beyond all non-RAM e820 If Xen gives us non-RAM E820 entries (dom0 only, typically), then make sure the extra RAM region is beyond them. It's OK for the extra space to grow into E820 regions, however. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:28 -07:00
Jeremy Fitzhardinge	42ee1471e9	xen: implement "extra" memory to reserve space for pages not present at boot When using the e820 map to get the initial pseudo-physical address space, look for either Xen-provided memory which doesn't lie within an E820 region, or an E820 RAM region which extends beyond the Xen-provided memory range. Count these pages, and add them to a new "extra memory" range. This range has an E820 RAM range to describe it - so the kernel will allocate page structures for it - but it is also marked reserved so that the kernel will not attempt to use it. The balloon driver can then add this range as a set of currently ballooned-out pages, which can be used to extend the domain beyond its original size. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:27 -07:00
Ian Campbell	35ae11fd14	xen: Use host-provided E820 map Rather than simply using a flat memory map from Xen, use its provided E820 map. This allows the domain builder to tell the domain to reserve space for more pages than those initially provided at domain-build time. It also allows the host to specify holes in the address space (for PCI-passthrough, for example). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:27 -07:00
Jeremy Fitzhardinge	cfd8951e08	xen: don't map missing memory When setting up a pte for a missing pfn (no matching mfn), just create an empty pte rather than a junk mapping. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:26 -07:00
Jeremy Fitzhardinge	33a847502b	xen: defer building p2m mfn structures until kernel is mapped When building mfn parts of p2m structure, we rely on being able to use mfn_to_virt, which in turn requires kernel to be mapped into the linear area (which is distinct from the kernel image mapping on 64-bit). Defer calling xen_build_mfn_list_list() until after xen_setup_kernel_pagetable(); Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge	c3798062f1	xen: add return value to set_phys_to_machine() set_phys_to_machine() can return false on failure, which means a memory allocation failure for the p2m structure. It can only fail if setting the mfn for a pfn in previously unused address space. It is guaranteed to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating the mfn for an existing pfn. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge	58e05027b5	xen: convert p2m to a 3 level tree Make the p2m structure a 3 level tree which covers the full possible physical space. The p2m structure contains mappings from the domain's pfns to system-wide mfns. The structure has 3 levels and two roots. The first root is for the domain's own use, and is linked with virtual addresses. The second is all mfn references, and is used by Xen on save/restore to allow it to update the p2m mapping for the domain. At boot, the domain builder provides a simple flat p2m array for all the initially present pages. We construct the two levels above that using the early_brk allocator. After early boot time, set_phys_to_machine() will allocate any missing levels using the normal kernel allocator (at GFP_KERNEL, so it must be called in a normal blocking context). Because the early_brk() API requires us to pre-reserve the maximum amount of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY config option, but its only negative side-effect is to increase the kernel's apparent bss size. However, since all unused brk memory is returned to the heap, there's no real downside to making it large. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:24 -07:00
Jeremy Fitzhardinge	bbbf61eff9	xen: make install_p2mtop_page() static Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge	1f2d9dd309	xen: set the actual extent of the mfn_list_list Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge	b7eb4ad391	xen: set shared_info->arch.max_pfn to max_p2m_pfn Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:22 -07:00
Jeremy Fitzhardinge	1e17fc7eff	xen: remove noise about registering vcpu info Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:21 -07:00
Jeremy Fitzhardinge	764f0138b9	xen: allocate level1_ident_pgt Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:20 -07:00
Jeremy Fitzhardinge	f0991802bb	xen: use early_brk for level2_kernel_pgt Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge	a2e8752987	xen: allocate p2m size based on actual max size Allocate p2m tables based on the actual runtime maximum pfn rather than the static config-time limit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge	a171ce6e7b	xen: dynamically allocate p2m space Use early brk mechanism to allocate p2m tables, to save memory when booting non-Xen. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:18 -07:00
Jeremy Fitzhardinge	5e941c0939	x86: add RESERVE_BRK_ARRAY() helper Useful when converting static arrays into boottime brk allocated objects. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:17 -07:00
Linus Torvalds	db08bf0877	Merge git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic * git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic/io.h: allow people to override individual funcs bitops: remove duplicated extern declarations bitops: make asm-generic/bitops/find.h more generic asm-generic: kdebug.h: Checkpatch cleanup asm-generic: fcntl: make exported headers use strict posix types asm-generic: cmpxchg does not handle non-long arguments asm-generic: make atomic_add_unless a function	2010-10-22 11:17:06 -07:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Linus Torvalds	91151240ed	Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE x86: Move alloc_desk_mask variables inside ifdef x86-32: Align IRQ stacks properly x86: Remove CONFIG_4KSTACKS x86: Always use irq stacks Fixed up trivial conflicts in include/linux/{irq.h, percpu-defs.h}	2010-10-22 08:54:21 -07:00
Linus Torvalds	211baf4ffc	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Hpet: Avoid the comparator readback penalty	2010-10-22 08:47:45 -07:00
Rakib Mullick	a69a0612c4	[CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit. This patch fixes the following warning. The function longrun_cpu_init() is marked with __cpuinit which calls longrun_get_policy() which is a __init function. So make longrun_get_policy with __cpuinit. WARNING: arch/x86/kernel/cpu/cpufreq/longrun.o(.cpuinit.text+0x4c5): Section mismatch in reference from the function longrun_cpu_init() to the function .init.text:longrun_get_policy() The function __cpuinit longrun_cpu_init() references a function __init longrun_get_policy(). If longrun_get_policy is only used by longrun_cpu_init then annotate longrun_get_policy with a matching annotation. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Dave Jones <davej@redhat.com>	2010-10-22 11:44:47 -04:00
Julia Lawall	b2a33c1728	[CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type In each case, the function has an unsigned return type, but returns a negative constant to indicate an error condition. Each function is only called once. For nforce2_detect_chipset, the result is only compared to 0, and for longrun_determine_freqs, the result is stored in a variable of type (signed) int. Thus, for both functions, unsigned can be dropped from the return type. A sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @exists@ identifier f; constant C; @@ unsigned f(...) { <+... * return -C; ...+> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Dave Jones <davej@redhat.com>	2010-10-22 11:44:47 -04:00
Andi Kleen	46e387bbd8	Merge branch 'hwpoison-hugepages' into hwpoison Conflicts: mm/memory-failure.c	2010-10-22 17:40:48 +02:00
Peter Zijlstra	96681fc3c9	perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations For performance reasons its best to use memory node local memory for per-cpu buffers. This logic comes from a much larger patch proposed by Stephane. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.514465326@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:26 +02:00
Peter Zijlstra	f80c9e304b	perf, x86: Clean up reserve_ds_buffers() signature Now that reserve_ds_buffers() never fails, change it to return void and remove all code dealing with the error return. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.462621937@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:26 +02:00
Peter Zijlstra	6809b6ea73	perf, x86: Less disastrous PEBS/BTS buffer allocation failure Currently PEBS/BTS buffers are allocated when we instantiate the first event, when this fails everything fails. This is a problem because esp. BTS tries to allocate a rather large buffer (64K), which can easily fail. This patch changes the logic such that when either buffer allocation fails, we simply don't allow events that would use these facilities, but continue functioning for all other events. This logic comes from a much larger patch proposed by Stephane. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.354429461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:26 +02:00
Peter Zijlstra	5553be2620	perf, x86: Fixup the precise_ip computation In case we don't have PEBS, the LBR fixup doesn't make sense. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.354429461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:25 +02:00
Peter Zijlstra	65af94baca	perf, x86: Extract DS alloc/free functions Again, mostly a cleanup to unclutter the reserve_ds_buffer() code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.304495776@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:25 +02:00
Peter Zijlstra	5ee25c8731	perf, x86: Extract PEBS/BTS allocation functions Mostly a cleanup.. it reduces code indentation and makes the code flow of reserve_ds_buffers() clearer. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.253453452@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:25 +02:00
Peter Zijlstra	b39f88acd7	perf, x86: Extract PEBS/BTS buffer free routines So that we may grow additional call-sites.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.196793164@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:24 +02:00
Jan Beulich	07bd8516a2	x86, asm: Restore parentheses around one pushl_cfi argument These were (intentionally) stripped by "fix CFI macro invocations to deal with shortcomings in gas" to expose problems with unexpected splitting of arguments by older gas also on newer versions, but as it turns out there is at least one distro (Ubuntu 6.06) where even not having any spaces in a macro argument doesn't reliably prevent splitting into multiple arguments. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4CC157DB020000780001E8A2@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 10:51:44 +02:00
Linus Torvalds	3044100e58	Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits) x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S xen: Cope with unmapped pages when initializing kernel pagetable memblock, bootmem: Round pfn properly for memory and reserved regions memblock: Annotate memblock functions with __init_memblock memblock: Allow memblock_init to be called early memblock/arm: Fix memblock_region_is_memory() typo x86, memblock: Remove __memblock_x86_find_in_range_size() memblock: Fix wraparound in find_region() x86-32, memblock: Make add_highpages honor early reserved ranges x86, memblock: Fix crashkernel allocation arm, memblock: Fix the sparsemem build memblock: Fix section mismatch warnings powerpc, memblock: Fix memblock API change fallout memblock, microblaze: Fix memblock API change fallout x86: Remove old bootmem code x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve x86: Remove not used early_res code x86, memblock: Replace e820_/_early string with memblock_ x86: Use memblock to replace early_res x86, memblock: Use memblock_debug to control debug message print out ... Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile	2010-10-21 18:52:11 -07:00
Linus Torvalds	e36f561a2c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags: Fix IRQ flag handling naming MIPS: Add missing #inclusions of <linux/irq.h> smc91x: Add missing #inclusion of <linux/irq.h> Drop a couple of unnecessary asm/system.h inclusions SH: Add missing consts to sys_execve() declaration Blackfin: Rename IRQ flags handling functions Blackfin: Add missing dep to asm/irqflags.h Blackfin: Rename DES PC2() symbol to avoid collision Blackfin: Split the BF532 BFIN_*_FIO_FLAG() functions to their own header Blackfin: Split PLL code from mach-specific cdef headers	2010-10-21 14:37:27 -07:00
Linus Torvalds	157b6ceb13	Merge branch 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, iommu: Update header comments with appropriate naming ia64, iommu: Add a dummy iommu_table.h file in IA64. x86, iommu: Fix IOMMU_INIT alignment rules x86, doc: Adding comments about .iommu_table and its neighbors. x86, iommu: Utilize the IOMMU_INIT macros functionality. x86, VT-d: Make Intel VT-d IOMMU use IOMMU_INIT_* macros. x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros. x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros. x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros. x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros. x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine. x86, iommu: Add proper dependency sort routine (and sanity check). x86, iommu: Make all IOMMU's detection routines return a value. x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure	2010-10-21 14:23:48 -07:00
Linus Torvalds	4a60cfa945	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits) apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets apic, x86: Check if EILVT APIC registers are available (AMD only) x86: ioapic: Call free_irte only if interrupt remapping enabled arm: Use ARCH_IRQ_INIT_FLAGS genirq, ARM: Fix boot on ARM platforms genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build x86: Switch sparse_irq allocations to GFP_KERNEL genirq: Switch sparse_irq allocator to GFP_KERNEL genirq: Make sparse_lock a mutex x86: lguest: Use new irq allocator genirq: Remove the now unused sparse irq leftovers genirq: Sanitize dynamic irq handling genirq: Remove arch_init_chip_data() x86: xen: Sanitise sparse_irq handling x86: Use sane enumeration x86: uv: Clean up the direct access to irq_desc x86: Make io_apic.c local functions static genirq: Remove irq_2_iommu x86: Speed up the irq_remapped check in hot pathes intr_remap: Simplify the code further ... Fix up trivial conflicts in arch/x86/Kconfig	2010-10-21 14:11:46 -07:00
Linus Torvalds	5fe8321b88	Merge branch 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, x2apic: Simplify apic init in SMP and UP builds x86, intr-remap: Remove IRTE setup duplicate code x86, intr-remap: Set redirection hint in the IRTE	2010-10-21 13:54:05 -07:00
Linus Torvalds	709d9f54cc	Merge branch 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI x86, vmware: Remove deprecated VMI kernel support Fix up trivial #include conflict in arch/x86/kernel/smpboot.c	2010-10-21 13:53:24 -07:00
Linus Torvalds	cca8209ed9	Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc: XO-1 uses/depends on PCI x86, olpc: Register XO-1 platform devices x86, olpc: Add XO-1 poweroff support x86, olpc: Don't retry EC commands forever x86, olpc: Rework BIOS signature check x86, olpc: Only enable PCI configuration type override on XO-1	2010-10-21 13:52:01 -07:00
Linus Torvalds	d77bdc423d	Merge branch 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mtrr: Support mtrr lookup for range spanning across MTRR range x86, mtrr: Refactor MTRR type overlap check code	2010-10-21 13:51:41 -07:00
Linus Torvalds	87affd0b94	Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: sfi: Make local functions static x86, earlyprintk: Add hsu early console for Intel Medfield platform x86, earlyprintk: Add earlyprintk for Intel Moorestown platform x86: Add two helper macros for fixed address mapping x86, mrst: A function in a header file needs to be marked "inline"	2010-10-21 13:47:54 -07:00
Linus Torvalds	c3b86a2942	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, percpu: Correct the ordering of the percpu readmostly section x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 \|\| HIGHMEM64G x86: Spread tlb flush vector between nodes percpu: Introduce a read-mostly percpu API x86, mm: Fix incorrect data type in vmalloc_sync_all() x86, mm: Hold mm->page_table_lock while doing vmalloc_sync x86, mm: Fix bogus whitespace in sync_global_pgds() x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation x86, mm: Add RESERVE_BRK_ARRAY() helper mm, x86: Saving vmcore with non-lazy freeing of vmas x86, kdump: Change copy_oldmem_page() to use cached addressing x86, mm: fix uninitialized addr in kernel_physical_mapping_init() x86, kmemcheck: Remove double test x86, mm: Make spurious_fault check explicitly check the PRESENT bit x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions x86, mm: Avoid unnecessary TLB flush	2010-10-21 13:47:29 -07:00
Linus Torvalds	8d8d2e9ccd	Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mem: Optimize memmove for small size and unaligned cases x86, mem: Optimize memcpy by avoiding memory false dependece x86, mem: Don't implement forward memmove() as memcpy()	2010-10-21 13:46:28 -07:00
Linus Torvalds	2a8b67fb72	Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line x86, hotplug: Move WBINVD back outside the play_dead loop x86, hotplug: Use mwait to offline a processor, fix the legacy case x86, mwait: Move mwait constants to a common header file	2010-10-21 13:45:38 -07:00
Linus Torvalds	b6f7e38dbb	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, fpu: Merge fpu_save_init() x86-32, fpu: Rewrite fpu_save_init() x86, fpu: Remove PSHUFB_XMM5_* macros x86, fpu: Remove unnecessary ifdefs from i387 code. x86-32, fpu: Remove math_emulate stub x86-64, fpu: Simplify constraints for fxsave/fxtstor x86-64, fpu: Fix %cs value in convert_from_fxsr() x86-64, fpu: Disable preemption when using TS_USEDFPU x86, fpu: Merge __save_init_fpu() x86, fpu: Merge tolerant_fwait() x86, fpu: Merge fpu_init() x86: Use correct type for %cr4 x86, xsave: Disable xsave in i387 emulation mode Fixed up fxsaveq-induced conflict in arch/x86/include/asm/i387.h	2010-10-21 13:34:32 -07:00
Alok Kataria	76fac077db	x86, kexec: Make sure to stop all CPUs before exiting the kernel x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait" this allows the caller to specify if it wants to stop until all the cpus have processed the stop IPI. This is required specifically for the kexec case where we should wait for all the cpus to be stopped before starting the new kernel. We now wait for the cpus to stop in all cases except for panic/kdump where we expect things to be broken and we are doing our best to make things work anyway. This patch fixes a legitimate regression, which was introduced during 2.6.30, by commit id `4ef702c10b`. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: <stable@kernel.org> v2.6.30-36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-21 13:30:44 -07:00
Linus Torvalds	214515b578	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove pr_<level> uses of KERN_<level> therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal' x86: Use {push,pop}{l,q}_cfi in more places i386: Add unwind directives to syscall ptregs stubs x86-64: Use symbolics instead of raw numbers in entry_64.S x86-64: Adjust frame type at paranoid_exit: x86-64: Fix unwind annotations in syscall stubs	2010-10-21 13:20:32 -07:00
Linus Torvalds	bf70030dc0	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cpu: Fix X86_FEATURE_NOPL x86, cpu: Re-run get_cpu_cap() after adjusting the CPUID level	2010-10-21 13:18:36 -07:00
Linus Torvalds	d60a2793ba	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove stale pmtimer_64.c x86, cleanups: Use clear_page/copy_page rather than memset/memcpy x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI x86, cleanup: Remove obsolete boot_cpu_id variable	2010-10-21 13:18:06 -07:00
Linus Torvalds	781c5a67f1	Merge branch 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, bios: Make the x86 early memory reservation a kernel option x86, bios: By default, reserve the low 64K for all BIOSes	2010-10-21 13:06:49 -07:00
Linus Torvalds	e990c77d06	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, asm: If the assembler supports fxsave64, use it i386: Make kernel_execve() suitable for stack unwinding	2010-10-21 13:06:00 -07:00
Linus Torvalds	2f0384e5fc	Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd_nb: Enable GART support for AMD family 0x15 CPUs x86, amd: Use compute unit information to determine thread siblings x86, amd: Extract compute unit information for AMD CPUs x86, amd: Add support for CPUID topology extension of AMD CPUs x86, nmi: Support NMI watchdog on newer AMD CPU families x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB x86, k8-gart: Decouple handling of garts and northbridges x86, cacheinfo: Fix dependency of AMD L3 CID x86, kvm: add new AMD SVM feature bits x86, cpu: Fix allowed CPUID bits for KVM guests x86, cpu: Update AMD CPUID feature bits x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit x86, AMD: Remove needless CPU family check (for L3 cache info) x86, tsc: Remove CPU frequency calibration on AMD	2010-10-21 13:01:08 -07:00
Linus Torvalds	bc4016f481	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits) sched: Export account_system_vtime() sched: Call tick_check_idle before __irq_enter sched: Remove irq time from available CPU power sched: Do not account irq time to current task x86: Add IRQ_TIME_ACCOUNTING sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time sched: Add a PF flag for ksoftirqd identification sched: Consolidate account_system_vtime extern declaration sched: Fix softirq time accounting sched: Drop group_capacity to 1 only if local group has extra capacity sched: Force balancing on newidle balance if local group has capacity sched: Set group_imb only a task can be pulled from the busiest cpu sched: Do not consider SCHED_IDLE tasks to be cache hot sched: Drop all load weight manipulation for RT tasks sched: Create special class for stop/migrate work sched: Unindent labels sched: Comment updates: fix default latency and granularity numbers tracing/sched: Add sched_pi_setprio tracepoint sched: Give CPU bound RT tasks preference sched: Try not to migrate higher priority RT tasks ...	2010-10-21 12:55:43 -07:00
Linus Torvalds	5d70f79b5e	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits) tracing: Fix compile issue for trace_sched_wakeup.c [S390] hardirq: remove pointless header file includes [IA64] Move local_softirq_pending() definition perf, powerpc: Fix power_pmu_event_init to not use event->ctx ftrace: Remove recursion between recordmcount and scripts/mod/empty jump_label: Add COND_STMT(), reducer wrappery perf: Optimize sw events perf: Use jump_labels to optimize the scheduler hooks jump_label: Add atomic_t interface jump_label: Use more consistent naming perf, hw_breakpoint: Fix crash in hw_breakpoint creation perf: Find task before event alloc perf: Fix task refcount bugs perf: Fix group moving irq_work: Add generic hardirq context callbacks perf_events: Fix transaction recovery in group_sched_in() perf_events: Fix bogus AMD64 generic TLB events perf_events: Fix bogus context time tracking tracing: Remove parent recording in latency tracer graph options tracing: Use one prologue for the preempt irqs off tracer function tracers ...	2010-10-21 12:54:49 -07:00
Linus Torvalds	1053e6bba0	Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Update copyright headers x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend AGP: Warn when GATT memory cannot be set to UC x86, GART: Disable GART table walk probes x86, GART: Remove superfluous AMD64_GARTEN	2010-10-21 12:49:15 -07:00
Daniel Drake	260586d2b4	Add OLPC XO-1 rfkill driver Add a software rfkill switch for the WLAN interface in the OLPC XO-1 laptop. It uses the OLPC embedded controller to cut/restore power to the Marvell WLAN chip on the motherboard. Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Matthew Garrett <mjg@redhat.com>	2010-10-21 10:10:44 -04:00
Konrad Rzeszutek Wilk	5bba6c56dc	X86/PCI: Remove the dependency on isapnp_disable. This looks to be vestigial dependency that had never been used even in the original code base (2.6.18) from which this driver was up-ported. Without this fix, with the CONFIG_ISAPNP, we get this compile failure: arch/x86/pci/xen.c: In function 'pci_xen_init': arch/x86/pci/xen.c:138: error: 'isapnp_disable' undeclared (first use in this function) arch/x86/pci/xen.c:138: error: (Each undeclared identifier is reported only once arch/x86/pci/xen.c:138: error: for each function it appears in.) Reported-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-21 09:36:07 -04:00
Ian Campbell	de1ef2065c	xen/privcmd: move remap_domain_mfn_range() to core xen code and export. This allows xenfs to be built as a module, previously it required flush_tlb_all and arbitrary_virt_to_machine to be exported. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-20 16:22:34 -07:00
Jeremy Fitzhardinge	1246ae0bb9	xen: add variable hypercall caller Allow non-constant hypercall to be called, for privcmd. [ Impact: make arbitrary hypercalls; needed for privcmd ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-20 16:22:27 -07:00
Jeremy Fitzhardinge	eba3ff8b99	xen: add xen_set_domain_pte() Add xen_set_domain_pte() to allow setting a pte mapping a page from another domain. The common case is to map from DOMID_IO, the pseudo domain which owns all IO pages, but will also be used in the privcmd interface to map other domain pages. [ Impact: new Xen-internal API for cross-domain mappings ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-20 16:22:27 -07:00
FUJITA Tomonori	66f2b06154	x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 \|\| HIGHMEM64G Set CONFIG_ARCH_DMA_ADDR_T_64BIT when we set dma_addr_t to 64 bits in <asm/types.h>; this allows Kconfig decisions based on this property. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <201010202255.o9KMtZXu009370@imap1.linux-foundation.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 16:02:42 -07:00
Shaohua Li	9329672021	x86: Spread tlb flush vector between nodes Currently flush tlb vector allocation is based on below equation: sender = smp_processor_id() % 8 This isn't optimal, CPUs from different node can have the same vector, this causes a lot of lock contention. Instead, we can assign the same vectors to CPUs from the same node, while different node has different vectors. This has below advantages: a. if there is lock contention, the lock contention is between CPUs from one node. This should be much cheaper than the contention between nodes. b. completely avoid lock contention between nodes. This especially benefits kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity to specific node. In my test, this could reduce > 20% CPU overhead in extreme case.The test machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd to the first CPU of the node. I run a workload with 4 sequential mmap file read thread. The files are empty sparse file. This workload will trigger a lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the extreme tlb flush lock contention because otherwise kswapd keeps migrating between CPUs of a node and I can't get stable result. Sure in real workload, we can't always see so big tlb flush lock contention, but it's possible. [ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 14:44:42 -07:00
Borislav Petkov	b40827fa72	x86-32, mm: Add an initial page table for core bootstrapping This patch adds an initial page table with low mappings used exclusively for booting APs/resuming after ACPI suspend/machine restart. After this, there's no need to add low mappings to swapper_pg_dir and zap them later or create own swsusp PGD page solely for ACPI sleep needs - we have initial_page_table for that. Signed-off-by: Borislav Petkov <bp@alien8.de> LKML-Reference: <20101020070526.GA9588@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 14:23:55 -07:00
H. Peter Anvin	d25e6b0b32	Merge branch 'x86/cleanups' into x86/trampoline	2010-10-20 14:22:45 -07:00
H. Peter Anvin	e44dea35cc	Merge branch 'x86/vmware' into x86/trampoline	2010-10-20 13:18:17 -07:00
Borislav Petkov	f01f7c56a1	x86, mm: Fix incorrect data type in vmalloc_sync_all() arch/x86/mm/fault.c: In function 'vmalloc_sync_all': arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast introduced by `617d34d9e5`. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 12:54:04 -07:00
Robert Richter	27afdf2008	apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets We want the BIOS to setup the EILVT APIC registers. The offsets were hardcoded and BIOS settings were overwritten by the OS. Now, the subsystems for MCE threshold and IBS determine the LVT offset from the registers the BIOS has setup. If the BIOS setup is buggy on a family 10h system, a workaround enables IBS. If the OS determines an invalid register setup, a "[Firmware Bug]: " error message is reported. We need this change also for upcomming cpu families. Signed-off-by: Robert Richter <robert.richter@amd.com> LKML-Reference: <1286360874-1471-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-20 04:42:13 +02:00
Robert Richter	a68c439b19	apic, x86: Check if EILVT APIC registers are available (AMD only) This patch implements checks for the availability of LVT entries (APIC500-530) and reserves it if used. The check becomes necessary since we want to let the BIOS provide the LVT offsets. The offsets should be determined by the subsystems using it like those for MCE threshold or IBS. On K8 only offset 0 (APIC500) and MCE interrupts are supported. Beginning with family 10h at least 4 offsets are available. Since offsets must be consistent for all cores, we keep track of the LVT offsets in software and reserve the offset for the same vector also to be used on other cores. An offset is freed by setting the entry to APIC_EILVT_MASKED. If the BIOS is right, there should be no conflicts. Otherwise a "[Firmware Bug]: ..." error message is generated. However, if software does not properly determines the offsets, it is not necessarily a BIOS bug. Signed-off-by: Robert Richter <robert.richter@amd.com> LKML-Reference: <1286360874-1471-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-20 04:42:13 +02:00
Ingo Molnar	14d4962dc8	Merge branch 'linus' into irq/core Merge reason: update to almost-final-.36 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-20 04:38:59 +02:00
Jan Beulich	3234282f33	x86, asm: Fix CFI macro invocations to deal with shortcomings in gas gas prior to (perhaps) 2.16.90 has problems with passing non- parenthesized expressions containing spaces to macros. Spaces, however, get inserted by cpp between any macro expanding to a number and a subsequent + or -. For the +, current x86 gas then removes the space again (future gas may not do so), but for the - the space gets retained and is then considered a separator between macro arguments. Fix the respective definitions for both the - and + cases, so that they neither contain spaces nor make cpp insert any (the latter by adding seemingly redundant parentheses). Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 14:28:02 -07:00
Jeremy Fitzhardinge	617d34d9e5	x86, mm: Hold mm->page_table_lock while doing vmalloc_sync Take mm->page_table_lock while syncing the vmalloc region. This prevents a race with the Xen pagetable pin/unpin code, which expects that the page_table_lock is already held. If this race occurs, then Xen can see an inconsistent page type (a page can either be read/write or a pagetable page, and pin/unpin converts it between them), which will cause either the pin or the set_p[gm]d to fail; either will crash the kernel. vmalloc_sync_all() should be called rarely, so this extra use of page_table_lock should not interfere with its normal users. The mm pointer is stashed in the pgd page's index field, as that won't be otherwise used for pgds. Reported-by: Ian Campbell <ian.cambell@eu.citrix.com> Originally-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CB88A4C.1080305@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:57:08 -07:00
Jeremy Fitzhardinge	44235dcde4	x86, mm: Fix bogus whitespace in sync_global_pgds() Whitespace cleanup only. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:56:03 -07:00
Avi Kivity	9581d442b9	KVM: Fix fs/gs reload oops with invalid ldt kvm reloads the host's fs and gs blindly, however the underlying segment descriptors may be invalid due to the user modifying the ldt after loading them. Fix by using the safe accessors (loadsegment() and load_gs_index()) instead of home grown unsafe versions. This is CVE-2010-3698. KVM-Stable-Tag. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-19 14:21:45 -02:00
Yinghai Lu	9717967c4b	x86: ioapic: Call free_irte only if interrupt remapping enabled On a system that support intr-rempping when booting with "intremap=off" [ 177.895501] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 [ 177.913316] IP: [<ffffffff8145fc18>] free_irte+0x47/0xc0 ... [ 178.173326] Call Trace: [ 178.173574] [<ffffffff810515b4>] destroy_irq+0x3a/0x75 [ 178.192934] [<ffffffff81051834>] arch_teardown_msi_irq+0xe/0x10 [ 178.193418] [<ffffffff81458dc3>] arch_teardown_msi_irqs+0x56/0x7f [ 178.213021] [<ffffffff81458e79>] free_msi_irqs+0x8d/0xeb Call free_irte only when interrupt remapping is enabled. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CBCB274.7010108@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-19 09:25:33 +02:00
Venkatesh Pallipadi	e82b8e4ea4	x86: Add IRQ_TIME_ACCOUNTING This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it when TSC is enabled. This change just enables fine grained irq time accounting, isn't used yet. Following patches use it for different purposes. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 20:52:25 +02:00
Peter Zijlstra	e360adbe29	irq_work: Add generic hardirq context callbacks Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 19:58:50 +02:00
Stephane Eranian	ba0cef3d14	perf_events: Fix bogus AMD64 generic TLB events PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x7 (to count all possibilities). PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x3 (to count all possibilities). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: <stable@kernel.org> # as far back as it applies LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 19:58:48 +02:00
Konrad Rzeszutek Wilk	74226b8c8a	xen/pci: Request ACS when Xen-SWIOTLB is activated. It used to done in the Xen startup code but that is not really appropiate. [v2: Update Kconfig with PCI requirement] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-18 10:49:38 -04:00
Alex Nixon	b5401a96b5	xen/x86/PCI: Add support for the Xen PCI subsystem The frontend stub lives in arch/x86/pci/xen.c, alongside other sub-arch PCI init code (e.g. olpc.c). It provides a mechanism for Xen PCI frontend to setup/destroy legacy interrupts, MSI/MSI-X, and PCI configuration operations. [ Impact: add core of Xen PCI support ] [ v2: Removed the IOMMU code and only focusing on PCI.] [ v3: removed usage of pci_scan_all_fns as that does not exist] [ v4: introduced pci_xen value to fix compile warnings] [ v5: squished fixes+features in one patch, changed Reviewed-by to Ccs] [ v7: added Acked-by] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Qing He <qing.he@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org	2010-10-18 10:49:35 -04:00
Stefano Stabellini	294ee6f89c	x86: Introduce x86_msi_ops Introduce an x86 specific indirect mechanism to setup MSIs. The MSI setup functions become function pointers in an x86_msi_ops struct, that defaults to the implementation in io_apic.c and msi.c. [v2: Use HAVE_DEFAULT_* knobs] Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-18 10:49:34 -04:00
Jeremy Fitzhardinge	5ee01f49c9	x86/PCI: make sure _PAGE_IOMAP it set on pci mappings When mapping pci space via /sys or /proc, make sure we're really doing a hardware mapping by setting _PAGE_IOMAP. [ Impact: bugfix; make PCI mappings map the right pages ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: x86@kernel.org	2010-10-18 10:49:31 -04:00
Alex Nixon	44de3395a4	x86/PCI: Clean up pci_cache_line_size Separate out x86 cache_line_size initialisation code into its own function (so it can be shared by Xen later in this patch series) [ Impact: cleanup ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: x86@kernel.org	2010-10-18 10:49:30 -04:00
Jeremy Fitzhardinge	7b586d7185	x86/io_apic: add get_nr_irqs_gsi() Impact: new interface to get max GSI Add get_nr_irqs_gsi() to return nr_irqs_gsi. Xen will use this to determine how many irqs it needs to reserve for hardware irqs. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-18 10:40:30 -04:00
Jeremy Fitzhardinge	d8e0420603	xen: define BIOVEC_PHYS_MERGEABLE() Impact: allow Xen control of bio merging When running in Xen domain with device access, we need to make sure the block subsystem doesn't merge requests across pages which aren't machine physically contiguous. To do this, we define our own BIOVEC_PHYS_MERGEABLE. When CONFIG_XEN isn't enabled, or we're not running in a Xen domain, this has identical behaviour to the normal implementation. When running under Xen, we also make sure the underlying machine pages are the same or adjacent. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-18 10:40:28 -04:00
Alex Nixon	23ace955c2	xen: Don't disable the I/O space If a guest domain wants to access PCI devices through the frontend driver (coming later in the patch series), it will need access to the I/O space. [ Impact: Allow for domU IO access, preparing for pci passthrough ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-18 10:40:27 -04:00
Justin P. Mattock	50a23e6eec	Update broken web addresses in arch directory. The patch below updates broken web addresses in the arch directory. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-10-18 11:03:21 +02:00
Bjorn Helgaas	1ca98fa652	x86/PCI: MMCONFIG: fix region end calculation The end of an MMCONFIG region depends on the ending bus number, not on the number of buses the region covers. We previously computed the wrong ending address whenever the starting bus number was non-zero, e.g.,: MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000) MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe1ffffff] (base 0xe0000000) The correct regions are: MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000) MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe3ffffff] (base 0xe0000000) Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-17 20:03:07 -07:00
Seth Heasley	cb04e95bdd	PCI: update Intel chipset names and defines This patch updates the defines for Intel devices in include/linux/pci_ids.h, referenced in arch/x86/pci/irq.c and drivers/i2c/busses/i2c-i801.c, reflecting approved legal branding, and using fuller code-names for products under development. Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-17 20:03:04 -07:00
Seth Heasley	25143fd127	x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs This patch adds the LPC Controller DeviceIDs for the Intel Patsburg PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-15 13:09:52 -07:00
Daniel Drake	80e7b19ae1	PCI: OLPC: Only enable PCI configuration type override on XO-1 This configuration type override is for XO-1 only and must not happen on XO-1.5. Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-15 13:09:51 -07:00
Thomas Gleixner	40ffa93791	x86: Remove stale pmtimer_64.c This file is unused since the apic unification in 2.6.29, but nobody noticed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-15 21:18:59 +02:00
Thomas Gleixner	940b3c7b19	x86: sfi: Make local functions static Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org>	2010-10-15 19:37:39 +02:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Robert Richter	b47fad3bfb	oprofile, x86: Add support for IBS periodic op counter extension The count value for IBS op sampling has been extended by 7 bits. The feature is reflected in bit 6 (OpCntExt) of the IBS capability register (CPUID Fn8000_001B_EAX). Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:43 +02:00
Robert Richter	25da695047	oprofile, x86: Add support for IBS branch target address reporting This patch adds support for IBS branch target address reporting. A new MSR (MSRC001_103B IBS Branch Target Address) has been added that provides the logical address in canonical form for the branch target. The size of the IBS sample that is transferred to the userland has been increased. For backward compatibility, the userland daemon must explicit enable the feature by writing to the oprofilefs file ibs_op/branch_target After enabling branch target address reporting, the userland daemon must handle the extended size of the IBS sample. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:42 +02:00
Robert Richter	53b39e9480	oprofile, x86: Introduce struct ibs_state This patch introduces struct ibs_state that will extended by additinal members in follow-on patches. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:41 +02:00
Robert Richter	fc889aa23f	oprofile, x86: Remove duplicate check for IBS_CAPS_OPCNT Since oprofile is setting up ibs_op/dispatched_ops in the fs only if the feature is available, its corresponding variable ibs_config.dispatched_ops is only set, if the feature is available. Thus the check is duplicate and can be removed. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:41 +02:00
Robert Richter	4ac945f002	oprofile, x86: Check IBS capability bits 1 and 2 There are IBS CPUID feature flags in CPUID Fn8000_001B to detect if the cpu supports IBS fetch sampling (FetchSam) and/or IBS execution sampling (OpSam). This patch adds checks if the both features are available. Spec: http://support.amd.com/us/Processor_TechDocs/31116.pdf Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:40 +02:00
Robert Richter	e63414740e	oprofile, x86: Add support for AMD family 14h This patch adds support for AMD family 14h (Ontario/Zacate) cpus. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:39 +02:00
Robert Richter	3acbf0849b	oprofile, x86: Add support for AMD family 12h This patch adds support for AMD family 12h (Llano) cpus. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:39 +02:00
Robert Richter	6268464b37	Merge remote branch 'tip/perf/core' into oprofile/core Conflicts: arch/arm/oprofile/common.c kernel/perf_event.c	2010-10-15 12:45:00 +02:00
Randy Dunlap	9e9006e909	x86, olpc: XO-1 uses/depends on PCI olpc-xo1 uses pci_*() interfaces so it should depend on PCI. Otherwise we get build failure like: arch/x86/kernel/olpc-xo1.c:65: error: implicit declaration of function 'pci_enable_device_io' arch/x86/kernel/olpc-xo1.c:71: error: implicit declaration of function 'pci_request_region' arch/x86/kernel/olpc-xo1.c:80: error: implicit declaration of function 'pci_release_region' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Daniel Drake <dsd@laptop.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20101014101313.adf7eb2a.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-15 09:41:38 +02:00
Ingo Molnar	0fdf13606b	Merge branch 'tip/perf/recordmcount-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2010-10-15 06:12:28 +02:00
Steven Rostedt	cf4db2597a	ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT The config option used by archs to let the build system know that the C version of the recordmcount works for said arch is currently called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To be more consistent with the name that all archs may use, it has been renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since we are building a C recordmcount and not a mcount_record. Suggested-by: Ingo Molnar <mingo@elte.hu> Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-14 23:32:44 -04:00
Ingo Molnar	d9d572a9c0	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-10-15 05:12:45 +02:00
Steven Rostedt	72441cb1fd	ftrace/x86: Add support for C version of recordmcount This patch adds the support for the C version of recordmcount and compile times show ~ 12% improvement. After verifying this works, other archs can add: HAVE_C_MCOUNT_RECORD in its Kconfig and it will use the C version of recordmcount instead of the perl version. Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-14 16:52:41 -04:00
Frederic Weisbecker	ebc8827f75	x86: Barf when vmalloc and kmemcheck faults happen in NMI In x86, faults exit by executing the iret instruction, which then reenables NMIs if we faulted in NMI context. Then if a fault happens in NMI, another NMI can nest after the fault exits. But we don't yet support nested NMIs because we have only one NMI stack. To prevent from that, check that vmalloc and kmemcheck faults don't happen in this context. Most of the other kernel faults in NMIs can be more easily spotted by finding explicit copy_from,to_user() calls on review. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>	2010-10-14 20:43:36 +02:00
Linus Torvalds	0eead9ab41	Don't dump task struct in a.out core-dumps akiphie points out that a.out core-dumps have that odd task struct dumping that was never used and was never really a good idea (it goes back into the mists of history, probably the original core-dumping code). Just remove it. Also do the access_ok() check on dump_write(). It probably doesn't matter (since normal filesystems all seem to do it anyway), but he points out that it's normally done by the VFS layer, so ... [ I suspect that we should possibly do "vfs_write()" instead of calling ->write directly. That also does the whole fsnotify and write statistics thing, which may or may not be a good idea. ] And just to be anal, do this all for the x86-64 32-bit a.out emulation code too, even though it's not enabled (and won't currently even compile) Reported-by: akiphie <akiphie@lavabit.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-14 10:57:40 -07:00
Jeremy Fitzhardinge	67e87f0a1c	x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S head_64.S maps up to 512 MiB, but that is not necessarity true for other entry paths, such as Xen. Thus, co-locate the setting of max_pfn_mapped with the code to actually set up the page tables in head_64.S. The 32-bit code is already so co-located. (The Xen code already sets max_pfn_mapped correctly for its own use case.) -v2: Yinghai fixed the following bug in this patch: \| \| max_pfn_mapped is in .bss section, so we need to set that \| after bss get cleared. Without that we crash on bootup. \| \| That is safe because Xen does not call x86_64_start_kernel(). \| Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Fixed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <4CB6AB24.9020504@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 09:06:49 +02:00
Masami Hiramatsu	3cba11d32b	kconfig/x86: Add HAVE_TEXT_POKE_SMP config for stop_machine dependency Since the text_poke_smp() definately depends on actual stop_machine() on smp, add that dependency to Kconfig. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20101014031042.4100.90877.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 08:55:29 +02:00
Masami Hiramatsu	3caa37519c	x86: Use __stop_machine() in text_poke_smp() Use __stop_machine() in text_poke_smp() because the caller must get online_cpus before calling text_poke_smp(), but stop_machine() do it again. We don't need it. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20101014031036.4100.83989.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 08:55:28 +02:00
Randy Dunlap	03f1a17cd5	x86/vsmp: Eliminate kconfig dependency warning Fix kconfig dependency warning to satisfy dependencies: warning: (X86_VSMP && X86_64 && PCI && X86_EXTENDED_PLATFORM \|\| XEN && PARAVIRT_GUEST && (X86_64 \|\| X86_32 && X86_PAE && !X86_VISWS) && X86_CMPXCHG && X86_TSC \|\| KVM_CLOCK && PARAVIRT_GUEST \|\| KVM_GUEST && PARAVIRT_GUEST \|\| LGUEST_GUEST && PARAVIRT_GUEST && X86_32) selects PARAVIRT which has unmet direct dependencies (PARAVIRT_GUEST) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> LKML-Reference: <20101013210023.9a033222.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 07:04:30 +02:00
Linus Torvalds	509d4486bd	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: For each node, register the memory blocks actually used x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order x86, mce, therm_throt.c: Fix missing curly braces in error handling logic	2010-10-13 16:34:23 -07:00
Jeremy Fitzhardinge	fef5ba7979	xen: Cope with unmapped pages when initializing kernel pagetable Xen requires that all pages containing pagetable entries to be mapped read-only. If pages used for the initial pagetable are already mapped then we can change the mapping to RO. However, if they are initially unmapped, we need to make sure that when they are later mapped, they are also mapped RO. We do this by knowing that the kernel pagetable memory is pre-allocated in the range e820_table_start - e820_table_end, so any pfn within this range should be mapped read-only. However, the pagetable setup code early_ioremaps the pages to write their entries, so we must make sure that mappings created in the early_ioremap fixmap area are mapped RW. (Those mappings are removed before the pages are presented to Xen as pagetable pages.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4CB63A80.8060702@goop.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-13 16:07:13 -07:00
H. Peter Anvin	d7acb92fea	x86-64, asm: If the assembler supports fxsave64, use it Kbuild allows for us to probe for the existence of specific constructs in the assembler, use them to find out if we can use fxsave64 and permit the compiler to generate better code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-13 16:00:29 -07:00
Daniel Drake	447b1d43de	x86, olpc: Register XO-1 platform devices The upcoming XO-1 rfkill driver (for drivers/platform/x86) will register itself with the name "xo1-rfkill", and the already-merged XO-1 poweroff code uses name "olpc-xo1" Add the necessary mechanics so that these devices are properly initialized on XO-1 laptops. Signed-off-by: Daniel Drake <dsd@laptop.org> LKML-Reference: <20101013181042.90C8F9D401B@zog.reactivated.net> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-13 11:51:25 -07:00
Ingo Molnar	3d8a1a6a8a	Merge branch 'amd-iommu/2.6.37' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu	2010-10-13 15:44:24 +02:00
Joerg Roedel	5d0d71569e	x86/amd-iommu: Update copyright headers This patch updates the copyright headers in all source files of the AMD IOMMU driver. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-10-13 11:13:21 +02:00
Matthew Garrett	5bcd757f93	x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend AMD's reference BIOS code had a bug that could result in the firmware failing to reenable the iommu on resume. It transpires that this causes certain less than desirable behaviour when it comes to PCI accesses, to whit them ending up somewhere near Bristol when the more desirable outcome was Edinburgh. Sadness ensues, perhaps along with filesystem corruption. Let's make sure that it gets turned back on, and that we restore its configuration so decisions it makes bear some resemblance to those made by reasonable people rather than crack-addled lemurs who spent all your DMA on Thunderbird. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-10-13 11:11:46 +02:00
Daniel Drake	bf1ebf0079	x86, olpc: Add XO-1 poweroff support Add a pm_power_off handler for the OLPC XO-1 laptop. The driver can be built modular and follows the behaviour of the APM driver, setting pm_power_off to NULL on unload. However, the ability to unload the module will probably be removed (with a simple __module_get(THIS_MODULE)) if/when XO-1 suspend/resume support is added to this file at a later date. Signed-off-by: Daniel Drake <dsd@laptop.org> LKML-Reference: <20101010094032.9AE669D401B@zog.reactivated.net> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-12 17:31:15 -07:00
Thomas Gleixner	2ee3906598	x86: Switch sparse_irq allocations to GFP_KERNEL No callers from atomic context (except boot) anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:46 +02:00
Thomas Gleixner	c2f31c37b7	x86: lguest: Use new irq allocator Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au>	2010-10-12 16:53:45 +02:00
Thomas Gleixner	ad9f43340f	x86: Use sane enumeration Instead of looping through all interrupts, use the bitmap lookup to find the next. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:44 +02:00
Thomas Gleixner	48b2650196	x86: uv: Clean up the direct access to irq_desc Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:43 +02:00
Thomas Gleixner	1a8ce7ff68	x86: Make io_apic.c local functions static No users outside of io_apic.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:43 +02:00
Thomas Gleixner	1a0730d664	x86: Speed up the irq_remapped check in hot pathes irq_2_iommu is in struct irq_cfg, so we can do the irq_remapped check based on irq_cfg instead of going through a lookup function. That's especially interesting in the eoi_ioapic_irq() hotpath. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:42 +02:00
Thomas Gleixner	423f085952	x86: Embedd irq_2_iommu into irq_cfg That interrupt remapping code is x86 specific and tied to the io_apic code. No need for separate allocator functions in the interrupt remapping code. This allows to simplify the code and irq_2_iommu is small (13 bytes on 64bit) so it's not a real problem even if interrupt remapping is runtime disabled. If it's compile time disabled the impact is zero. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:41 +02:00
Thomas Gleixner	bc5fdf9f3a	x86: io_apic: Remove the now unused sparse_irq arch_* functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	fbc6bff04a	x86: ioapic: Cleanup sparse irq code Switch over to the new allocator and remove all the magic which was caused by the unability to destroy irq descriptors. Get rid of the create_irq_nr() loop for sparse and non sparse irq. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Yinghai Lu	fe6dab4e79	x86: Don't setup ioapic irq for sci twice The sparseirq rework triggered a warning in the iommu code, which was caused by setting up ioapic for ACPI irq 9 twice. This function is solely to handle interrupts which are on a secondary ioapic and outside the legacy irq range. Replace the sparse irq_to_desc check with a non ifdeffed version. [ tglx: Moved it before the ioapic sparse conversion and simplified the inverse logic ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB00122.3030301@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	f981a3dc19	x86: io_apic: Prepare alloc/free_irq_cfg() Rename the grossly misnamed get_one_free_irq_cfg() to alloc_irq_cfg(). Add a (not yet used) irq number argument to free_irq_cfg() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	08c33db6d0	x86: Implement new allocator functions Implement new allocator functions which make use of the core changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	6e2fff50a5	x86: ioapic: Cleanup get_one_free_irq_cfg() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	7e495529b6	x86: ioapic: Cleanup some more Cleanup after the irq_chip conversion a bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	be5b7bf738	x86: Convert ht set_affinity to new chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	0e09ddf2d7	x86: Cleanup hpet affinity setting Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	fe52b2d259	x86: Convert dmar affinity setting to new chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	b5d1c46579	x86: Convert remapped msi to new chip.irq_set_affinity function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	f19f5ecc92	x86: Convert remapped ioapic affinity setting to new irq chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	5346b2a78f	x86: Convert msi affinity setting to new chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	f7e909eae4	x86: Prepare the affinity common functions for taking struct irq_data * While at it rename it to sensible function names and fix the return value from unsigned to int for __ioapic_set_affinity (set_desc_affinity). Returning -1 in a function returning unsigned int is somewhat strange. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	60c69948e5	x86: ioapic: Clean up the direct access to irq_desc Most of it is useless pseudo optimization. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	e9f7ac664b	ht: Convert to new irq_chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	5c2837fbaa	dmar: Convert to new irq chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <dwmw2@infradead.org>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	d0fbca8f93	x86: ioapic/hpet: Convert to new chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	90297c5fe7	x86: ioapic: Convert mask to new irq_chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	61a38ce3f5	x86: io_apic: Convert startup to new irq_chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	dd5f15e5cf	x86: Cleanup io_apic Sanitize functions. Remove irq_desc pointer magic. Preparatory patch for further cleanups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:36 +02:00
Thomas Gleixner	d4eba29770	x86: Cleanup access to irq_data Fixup the open coded access to irq_desc->[handler_data\|chip_data\|msi-desc] Use the macros and inline functions for it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:36 +02:00
Thomas Gleixner	4305df947c	x86: i8259: Convert to new irq_chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:36 +02:00
Thomas Gleixner	020dd984d7	x86: Cleanup visws interrupt handling Remove the open coded access to irq_desc and convert to the new irq chip functions. Change the mask function of piix4_virtual_irq_type so we can use the generic irq handling function for the virtual interrupt instead of open coding it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	fe25c7fc2e	x86: lguest: Convert to new irq chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	a5ef2e7040	x86: Sanitize apb timer interrupt handling Disable the interrupt in CPU_DEAD where it belongs. Remove the open coded irq_desc manipulation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	a3c08e5d80	x86: Convert irq_chip access to new functions Before moving the irq chips to the new functions, fixup direct callers. The cpu offline irq fixup code needs to become generic and archs need to honour the "force" flag as an indicator, but that's for later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	011d578fda	x86: Remove useless reinitialization of irq descriptors The descriptors are already initialized in exactly this way. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:34 +02:00
Thomas Gleixner	39431acb1a	pci: Cleanup the irq_desc mess in msi Handing down irq_desc to msi just so that msi can access irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code can hand down a pointer to msi_desc so msi code does not need to know about the irq descriptor at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:34 +02:00
Thomas Gleixner	1c9db52534	pci: Convert msi to new irq_chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <linux@arm.linux.org.uk>	2010-10-12 16:53:34 +02:00
Thomas Gleixner	7c5f13519a	Merge branch 'x86/urgent' of into irq/sparseirq Reason: Pull in the latest io_apic bugfixes Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-12 16:41:26 +02:00
Thomas Gleixner	5e62feabcc	Merge branch 'x86/cleanups' into irq/sparseirq Reason: Avoid conflicts with removal of boot_cpu_id Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-12 16:40:42 +02:00
Thomas Gleixner	8ffcfa4e2d	Merge branch 'x86/x2apic' into irq/sparseirq Reason: Avoid conflicts with the x2apic modifications Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-12 16:39:53 +02:00
Thomas Gleixner	b683de2b3c	genirq: Query arch for number of early descriptors sparse irq sets up NR_IRQS_LEGACY irq descriptors and archs then go ahead and allocate more. Use the unused return value of arch_probe_nr_irqs() to let the architecture return the number of early allocations. Fix up all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:39:08 +02:00
Michal Marek	239060b93b	Merge branch 'kbuild/rc-fixes' into kbuild/kconfig We need to revert the temporary hack in `71ebc01`, hence the merge.	2010-10-12 15:09:06 +02:00
Zhang Rui	dab5fff14d	acpi-cpufreq: fix a memleak when unloading driver We didn't free per_cpu(acfreq_data, cpu)->freq_table when acpi_freq driver is unloaded. Resulting in the following messages in /sys/kernel/debug/kmemleak: unreferenced object 0xf6450e80 (size 64): comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s) hex dump (first 32 bytes): 00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00 ......$.......$. 02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00 .....j.......5.. backtrace: [<c123ba97>] kmemleak_alloc+0x27/0x50 [<c109f96f>] __kmalloc+0xcf/0x110 [<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq] [<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0 [<c11920b7>] sysdev_driver_register+0x97/0x110 [<c11cce56>] cpufreq_register_driver+0x86/0x140 [<f9dad080>] 0xf9dad080 [<c1001130>] do_one_initcall+0x30/0x160 [<c10626e9>] sys_init_module+0x99/0x1e0 [<c1002d97>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21 Tested-by: Toralf Forster <toralf.foerster@gmx.de> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2010-10-12 00:58:28 -04:00
H. Peter Anvin	8e4029ee35	Merge branch 'x86/urgent' into core/memblock Reason for merge: Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree. Resolved Conflicts: arch/x86/mm/srat_64.c Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 17:05:11 -07:00
Nikanth Karthikesan	50f2d7f682	x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA commit `d9c2d5ac6a` "x86, numa: Use near(er) online node instead of roundrobin for NUMA" changed NUMA initialization on Intel to choose the nearest online node or first node. Fake NUMA would be better of with round-robin initialization, instead of the all CPUS on first node. Change the choice of first node, back to round-robin. For testing NUMA kernel behaviour without cpusets and NUMA aware applications, it would be better to have cpus in different nodes, rather than all in a single node. With cpusets migration of tasks scenarios cannot not be tested. I guess having it round-robin shouldn't affect the use cases for all cpus on the first node. The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to be the case, which was changed by commit `d9c2d5ac6`. It changed from roundrobin to nearer or first node. And I couldn't find any reason for this change in its changelog. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2010-10-11 16:16:56 -07:00
Jeremy Fitzhardinge	236260b90d	memblock: Allow memblock_init to be called early The Xen setup code needs to call memblock_x86_reserve_range() very early, so allow it to initialize the memblock subsystem before doing so. The second memblock_init() is ignored. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <4CACFDAD.3090900@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 15:59:01 -07:00
Yinghai Lu	73cf624d02	x86, numa: For each node, register the memory blocks actually used Russ reported SGI UV is broken recently. He said: \| The SRAT table shows that memory range is spread over two nodes. \| \| SRAT: Node 0 PXM 0 100000000-800000000 \| SRAT: Node 1 PXM 1 800000000-1000000000 \| SRAT: Node 0 PXM 0 1000000000-1080000000 \| \|Previously, the kernel early_node_map[] would show three entries \|with the proper node. \| \|[ 0.000000] 0: 0x00100000 -> 0x00800000 \|[ 0.000000] 1: 0x00800000 -> 0x01000000 \|[ 0.000000] 0: 0x01000000 -> 0x01080000 \| \|The problem is recent community kernel early_node_map[] shows \|only two entries with the node 0 entry overlapping the node 1 \|entry. \| \| 0: 0x00100000 -> 0x01080000 \| 1: 0x00800000 -> 0x01000000 After looking at the changelog, Found out that it has been broken for a while by following commit \|commit `8716273cae` \|Author: David Rientjes <rientjes@google.com> \|Date: Fri Sep 25 15:20:04 2009 -0700 \| \| x86: Export srat physical topology Before that commit, register_active_regions() is called for every SRAT memory entry right away. Use nodememblk_range[] instead of nodes[] in order to make sure we capture the actual memory blocks registered with each node. nodes[] contains an extended range which spans all memory regions associated with a node, but that does not mean that all the memory in between are included. Reported-by: Russ Anderson <rja@sgi.com> Tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB27BDF.5000800@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> 2.6.33 .34 .35 .36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 15:26:15 -07:00
Zachary Amsden	47008cd887	KVM: x86: Move TSC reset out of vmcb_init The VMCB is reset whenever we receive a startup IPI, so Linux is setting TSC back to zero happens very late in the boot process and destabilizing the TSC. Instead, just set TSC to zero once at VCPU creation time. Why the separate patch? So git-bisect is your friend. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-11 12:36:07 +02:00
Zachary Amsden	58877679fd	KVM: x86: Fix SVM VMCB reset On reset, VMCB TSC should be set to zero. Instead, code was setting tsc_offset to zero, which passes through the underlying TSC. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-11 12:36:07 +02:00
Borislav Petkov	6dcbfe4f0b	x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order This fixes possible cases of not collecting valid error info in the MCE error thresholding groups on F10h hardware. The current code contains a subtle problem of checking only the Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM thresholding group) in its first iteration and breaking out if the bit is cleared. But (!), this MSR contains an offset value, BlkPtr[31:24], which points to the remaining MSRs in this thresholding group which might contain valid information too. But if we bail out only after we checked the valid bit in the first MSR and not the block pointer too, we miss that other information. The thing is, MC4_MISC0[BlkPtr] is not predicated on MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked prior to iterating over the MCI_MISCj thresholding group, irrespective of the MC4_MISC0[Valid] setting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-11 11:04:36 +02:00
Akinobu Mita	708ff2a009	bitops: make asm-generic/bitops/find.h more generic asm-generic/bitops/find.h has the extern declarations of find_next_bit() and find_next_zero_bit() and the macro definitions of find_first_bit() and find_first_zero_bit(). It is only usable by the architectures which enables CONFIG_GENERIC_FIND_NEXT_BIT and disables CONFIG_GENERIC_FIND_FIRST_BIT. x86 and tile enable both CONFIG_GENERIC_FIND_NEXT_BIT and CONFIG_GENERIC_FIND_FIRST_BIT. These architectures cannot include asm-generic/bitops/find.h in their asm/bitops.h. So ifdefed extern declarations of find_first_bit and find_first_zero_bit() are put in linux/bitops.h. This makes asm-generic/bitops/find.h usable by these architectures and use it. Also this change is needed for the forthcoming duplicated extern declarations cleanup. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Chris Metcalf <cmetcalf@tilera.com>	2010-10-09 21:51:44 +02:00
Konrad Rzeszutek Wilk	6e96366933	x86, iommu: Update header comments with appropriate naming The header comments diverged a bit from the implementation. Lets re-sync them. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1286564028-2352-3-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-08 13:11:21 -07:00
Ingo Molnar	7cd2541cf2	Merge commit 'v2.6.36-rc7' into perf/core Conflicts: arch/x86/kernel/module.c Merge reason: Resolve the conflict, pick up fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:46:27 +02:00
Jin Dongming	b62be8ea9d	x86, mce, therm_throt.c: Fix missing curly braces in error handling logic When the feature PTS is not supported by CPU, the sysfile package_power_limit_count for package should not be generated. This patch is used for fixing missing { and }. The patch is not complete as there are other error handling problems in this function - but that can wait until the merge window. Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@initel.com> Acked-by: Jean Delvare <khali@linux-fr.org> Cc: Brown Len <len.brown@intel.com> Cc: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org> LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:29:20 +02:00
Paul Fox	286e5b97eb	x86, olpc: Don't retry EC commands forever Avoids a potential infinite loop. It was observed once, during an EC hacking/debugging session - not in regular operation. Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: dilinger@queued.net Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:06:09 +02:00
Feng Tang	4d033556f1	x86, earlyprintk: Add hsu early console for Intel Medfield platform Intel Medfield platform has a high speed UART device, which could act as a early console. To enable early printk of HSU console, simply add "earlyprintk=hsu" in kernel command line. Currently we put the code in the early_printk_mrst.c as it is also for Intel MID platforms like the mrst early console Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Alan Cox <alan@linux.intel.com> Cc: greg@kroah.com LKML-Reference: <1284361736-23011-5-git-send-email-feng.tang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:01:47 +02:00
Feng Tang	c20b5c3318	x86, earlyprintk: Add earlyprintk for Intel Moorestown platform Intel Moorestown platform has a spi-uart device(Maxim3110), which connects to a Designware spi core controller. This patch will add early console function based on it. As it will be used long before Linux spi subsystem get initialised, we simply directly manipulate the spi controller's register to acheive the early console func. This is safe as it will be disabled when devices subsytem get initialised. To use it, user need enable CONFIG_X86_MRST_EARLY_PRINTK in kenrel config and add "earlyprintk=mrst" in kernel command line. Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Alan Cox <alan@linux.intel.com> Cc: greg@kroah.com LKML-Reference: <1284361736-23011-4-git-send-email-feng.tang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:01:47 +02:00
Feng Tang	5a47c7dae8	x86: Add two helper macros for fixed address mapping Sometimes fixmap will be used to map an physical address which is not PAGE align, so to use it we need first map it and then add the address offset to the mapped fixed address. These 2 new helpers are suggested by Ingo Molnar to make the process simpler. For a physicall address like "phys", a directly usable virtual address can be get by virt = (void )set_fixmap_offset(fixed_idx, phys); or virt = (void )set_fixmap_offset_nocache(fixed_idx, phys); (depends on whether the physical address is cachable or not). Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: alan@linux.intel.com Cc: greg@kroah.com Cc: x86@kernel.org LKML-Reference: <1284361736-23011-3-git-send-email-feng.tang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:01:46 +02:00
Andi Kleen	f672b49b07	x86: HWPOISON: Report correct address granuality for huge hwpoison faults An earlier patch fixed the hwpoison fault handling to encode the huge page size in the fault code of the page fault handler. This is needed to report this information in SIGBUS to user space. This is a straight forward patch to pass this information through to the signal handling in the x86 specific fault.c Cc: x86@kernel.org Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: fengguang.wu@intel.com Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:46 +02:00
Ingo Molnar	153db80f8c	Merge commit 'v2.6.36-rc7' into core/memblock Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 09:15:00 +02:00
Zhao Yakui	68f4d5a00a	x86, setup: Use string copy operation to optimze copy in kernel compression The kernel decompression code parses the ELF header and then copies the segment to the corresponding destination. Currently it uses slow byte-copy code. This patch makes it use the string copy operations instead. In the test the copy performance can be improved very significantly after using the string copy operation mechanism. 1. The copy time can be reduced from 150ms to 20ms on one Atom machine 2. The copy time can be reduced about 80% on another machine The time is reduced from 7ms to 1.5ms when using 32-bit kernel. The time is reduced from 10ms to 2ms when using 64-bit kernel. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-07 21:23:09 -07:00
H. Peter Anvin	55572b293b	x86, mrst: A function in a header file needs to be marked "inline" A function in a header file needs to be explicitly marked "inline", or gcc will complain if it is not used. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: <stable@kernel.org> v2.6.36 LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>	2010-10-07 16:45:18 -07:00
Namhyung Kim	a416e9e1dd	x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation On 32-bit non-PAE system, cast to 'phys_addr_t' truncates value before subtraction. Subtracting before cast produce same result but remove following warnings from sparse: arch/x86/include/asm/pgtable_types.h:255:38: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable_types.h:270:38: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable.h:127:32: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable.h:132:32: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable.h:344:31: warning: cast truncates bits from constant value (100000000 becomes 0) 64-bit or PAE machines will not be affected by this change. Signed-off-by: Namhyung Kim <namhyung@gmail.com> LKML-Reference: <1285770588-14065-1-git-send-email-namhyung@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-07 16:36:17 -07:00
David Howells	df9ee29270	Fix IRQ flag handling naming Fix the IRQ flag handling naming. In linux/irqflags.h under one configuration, it maps: local_irq_enable() -> raw_local_irq_enable() local_irq_disable() -> raw_local_irq_disable() local_irq_save() -> raw_local_irq_save() ... and under the other configuration, it maps: raw_local_irq_enable() -> local_irq_enable() raw_local_irq_disable() -> local_irq_disable() raw_local_irq_save() -> local_irq_save() ... This is quite confusing. There should be one set of names expected of the arch, and this should be wrapped to give another set of names that are expected by users of this facility. Change this to have the arch provide: flags = arch_local_save_flags() flags = arch_local_irq_save() arch_local_irq_restore(flags) arch_local_irq_disable() arch_local_irq_enable() arch_irqs_disabled_flags(flags) arch_irqs_disabled() arch_safe_halt() Then linux/irqflags.h wraps these to provide: raw_local_save_flags(flags) raw_local_irq_save(flags) raw_local_irq_restore(flags) raw_local_irq_disable() raw_local_irq_enable() raw_irqs_disabled_flags(flags) raw_irqs_disabled() raw_safe_halt() with type checking on the flags 'arguments', and then wraps those to provide: local_save_flags(flags) local_irq_save(flags) local_irq_restore(flags) local_irq_disable() local_irq_enable() irqs_disabled_flags(flags) irqs_disabled() safe_halt() with tracing included if enabled. The arch functions can now all be inline functions rather than some of them having to be macros. Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300] Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile] Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze] Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM] Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR] Acked-by: Tony Luck <tony.luck@intel.com> [IA-64] Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R] Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU] Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS] Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC] Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC] Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390] Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score] Acked-by: Matt Fleming <matt@console-pimps.org> [SH] Acked-by: David S. Miller <davem@davemloft.net> [Sparc] Acked-by: Chris Zankel <chris@zankel.net> [Xtensa] Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha] Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300] Cc: starvik@axis.com [CRIS] Cc: jesper.nilsson@axis.com [CRIS] Cc: linux-cris-kernel@axis.com	2010-10-07 14:08:55 +01:00
Linus Torvalds	34984f54b7	Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: do not initialize PV timers on HVM if !xen_have_vector_callback xen: do not set xenstored_ready before xenbus_probe on hvm	2010-10-06 09:51:28 -07:00
Jeremy Fitzhardinge	161b0275e2	x86, mm: Add RESERVE_BRK_ARRAY() helper This is useful when converting static arrays into boot-time brk allocated objects. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4C805EEA.1080205@goop.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 22:16:54 -07:00
Yinghai Lu	16c36f743b	x86, memblock: Remove __memblock_x86_find_in_range_size() Fold it into memblock_x86_find_in_range(), and change bad_addr_size() to check_reserve_memblock(). So whole memblock_x86_find_in_range_size() code is more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CAA4DEC.4000401@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:45:43 -07:00
Yinghai Lu	1d931264af	x86-32, memblock: Make add_highpages honor early reserved ranges Originally the only early reserved range that is overlapped with high pages is "KVA RAM", but we already do remove that from the active ranges. However, It turns out Xen could have that kind of overlapping to support memory ballooning.x So we need to make add_highpage_with_active_regions() to subtract memblock reserved just like low ram; this is the proper design anyway. In this patch, refactering get_freel_all_memory_range() to make it can be used by add_highpage_with_active_regions(). Also we don't need to remove "KVA RAM" from active ranges. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CABB183.1040607@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:44:35 -07:00
Yinghai Lu	9f4c13964b	x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with the x86 memblock changes. 1. crashkernel=128M@32M always reported that range is used, even if the first kernel is small and does not usethat range 2. we always got following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to allocate from the top of the range, whereas the kexec code was written assuming that allocation was always near the bottom and that it could blindly extend memory upward. Unfortunately the kexec code doesn't have a system for requesting the range that it really needs, so this is subject to probabilistic failures. This patch hacks around the problem by limiting the target range heuristically to below the traditional bzImage max range. This number is arbitrary and not always correct, and a much better result would be obtained by having kexec communicate this number based on the kernel header information and any appropriate command line options. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CABAF2A.5090501@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:43:14 -07:00
Linus Torvalds	39c12be86a	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf trace scripting: Fix extern struct definitions perf ui hist browser: Fix segfault on 'a' for annotate perf tools: Fix build breakage perf, x86: Handle in flight NMIs on P4 platform oprofile, ARM: Release resources on failure oprofile: Add Support for Intel CPU Family 6 / Model 29	2010-10-05 11:57:37 -07:00
Linus Torvalds	5336377d62	modules: Fix module_bug_list list corruption race With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-05 11:29:27 -07:00
Stefano Stabellini	31e7e931cd	xen: do not initialize PV timers on HVM if !xen_have_vector_callback if !xen_have_vector_callback do not initialize PV timer unconditionally because we still don't know how many cpus are available and if there is more than one we won't be able to receive the timer interrupts on cpu > 0. This patch fixes an hang at boot when Xen does not support vector callbacks and the guest has multiple vcpus. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>	2010-10-05 13:39:23 +01:00
Andi Kleen	c62f981f93	perf, gcc-4.6: Fix set but unused variable Just dead code I believe. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: andi@firstfloor.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-05 09:48:07 +02:00
Ingo Molnar	00e8976200	Merge branch 'perf/urgent' into perf/core Conflicts: tools/perf/util/ui/browsers/hists.c Merge reason: fix the conflict and merge in changes for dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-05 09:47:14 +02:00
Borislav Petkov	366d4a43b1	x86, cpu: Fix X86_FEATURE_NOPL `ba0593bf55` cleared the aforementioned cpuid bit only on 32-bit due to various problems with Virtual PC. This somehow got lost during the 32- + 64-bit merge so restore the feature bit on 64-bit. For that, set it explicitly for non-constant arguments of cpu_has(). Update comment for future reference. Signed-off-by: Borislav Petkov <bp@alien8.de> LKML-Reference: <20101004073127.GA20305@liondog.tnic> Cc: Ryan O'Neill <ryan@innosecc.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-04 11:22:24 -07:00
Linus Torvalds	5a4bbd01c8	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc [CPUFREQ] acpi-cpufreq: add missing __percpu markup	2010-10-04 11:14:21 -07:00
Thomas Gleixner	3bb9808e99	x86: Use genirq Kconfig Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100927121843.314600915@linutronix.de> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-04 11:01:15 +02:00
Andreas Herrmann	5c80cc78de	x86, amd_nb: Enable GART support for AMD family 0x15 CPUs AMD CPU family 0x15 still supports GART for compatibility reasons. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930124316.GG20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	d4fbe4f035	x86, amd: Use compute unit information to determine thread siblings This information is vital for different load balancing policies. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930124156.GF20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	6057b4d331	x86, amd: Extract compute unit information for AMD CPUs Get compute unit information from CPUID Fn8000_001E_EBX. (See AMD CPUID Specification - publication # 25481, revision 2.34, September 2010.) Note that each core on a compute unit still has a core_id of its own. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123857.GE20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	23588c38a8	x86, amd: Add support for CPUID topology extension of AMD CPUs Node information (ID, number of internal nodes) is provided via CPUID Fn8000_001e_ECX. See AMD CPUID Specification (Publication # 25481, Revision 2.34, September 2010). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123628.GD20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	420b13b60a	x86, nmi: Support NMI watchdog on newer AMD CPU families CPU families 0x12, 0x14 and 0x15 support this functionality. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123357.GC20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	3fdbf004c1	x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs Instead of adapting the CPU family check in amd_special_default_mtrr() for each new CPU family assume that all new AMD CPUs support the necessary bits in SYS_CFG MSR. Tom2Enabled is architectural (defined in APM Vol.2). Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT. In pre K8-NPT BKDG this bit is reserved (read as zero). W/o this adaption Linux would unnecessarily complain about bad MTRR settings on every new AMD CPU family, e.g. [ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM. Cc: stable@kernel.org # .32.x, .35.x Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123235.GB20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:31 -07:00
H. Peter Anvin	86ffb08519	Merge remote branch 'origin/x86/cpu' into x86/amd-nb	2010-10-01 16:18:11 -07:00
Linus Torvalds	f4a3330d76	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hpet: Fix bogus error check in hpet_assign_irq() x86, irq: Plug memory leak in sparse irq x86, cpu: After uncapping CPUID, re-run CPU feature detection	2010-10-01 15:02:41 -07:00
Robert Richter	5140434d5f	oprofile, x86: Simplify init/exit functions Now, that we only call the exit function if init succeeds with commit: `979048e` oprofile: don't call arch exit code from init code on failure we can simplify the x86 init/exit functions too. Variable using_nmi becomes obsolete. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-01 17:05:47 +02:00
Jiri Olsa	f6dedecc37	oprofile, x86: Adding backtrace dump for 32bit process in compat mode This patch implements the oprofile backtrace generation for 32 bit applications running in the 64bit environment (compat mode). With this change it's possible to get backtrace for 32bits applications under the 64bits environment using oprofile's callgraph options. opcontrol --setup -c ... opreport -l -cg ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-01 16:07:18 +02:00
Jiri Olsa	40c6b3cb64	oprofile, x86: Using struct stack_frame for 64bit processes dump Removing unnecessary struct frame_head and replacing it with struct stack_frame. The struct stack_frame is already defined and used in other places in kernel, so there's no reason to define new structure. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-01 16:07:09 +02:00
Thomas Gleixner	0219896228	x86, hpet: Fix bogus error check in hpet_assign_irq() create_irq() returns -1 if the interrupt allocation failed, but the code checks for irq == 0. Use create_irq_nr() instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-30 15:57:35 -07:00
Thomas Gleixner	1cf180c94e	x86, irq: Plug memory leak in sparse irq free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this triggers a use after free caused by the fact that copying struct irq_cfg is done with memcpy, which copies the pointer not the cpumask. Fix both places. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-30 15:57:35 -07:00
Pekka Enberg	3682930623	[CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc If acpi_evaluate_object() function call doesn't fail, we must kfree() output.buffer before returning from pcc_cpufreq_do_osc(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Dave Jones <davej@redhat.com>	2010-09-30 16:14:23 -04:00
Namhyung Kim	86cf147494	[CPUFREQ] acpi-cpufreq: add missing __percpu markup acpi_perf_data is a percpu pointer but was missing __percpu markup. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dave Jones <davej@redhat.com>	2010-09-30 16:14:22 -04:00
Cyrill Gorcunov	03e22198d2	perf, x86: Handle in flight NMIs on P4 platform Stephane reported we've forgot to guard the P4 platform against spurious in-flight performance IRQs. Fix it. This fixes potential spurious 'dazed and confused' NMI messages. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Robert Richter <robert.richter@amd.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1285815698-4298-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-30 09:17:59 +02:00
Dan Carpenter	b365a85c68	x86, UV: Use allocated buffer in tlb_uv.c:tunables_read() The original code didn't check that the value returned from snprintf() was less than the size of the buffer. Although it didn't cause a runtime bug in this case, it makes the static checkers complain. Andrew Morton suggested a dynamically sized buffer would be cleaner. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Cliff Wickman <cpw@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> LKML-Reference: <20100929083118.GA6376@bicker> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-30 09:11:27 +02:00
Namhyung Kim	bd126b23a2	ACPI: add missing __percpu markup in arch/x86/kernel/acpi/cstate.c cpu_cstate_entry is a percpu pointer but was missing __percpu markup. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Len Brown <len.brown@intel.com>	2010-09-28 21:38:20 -04:00
H. Peter Anvin	d900329e20	x86, cpu: After uncapping CPUID, re-run CPU feature detection After uncapping the CPUID level, we need to also re-run the CPU feature detection code. This resolves kernel bugzilla 16322. Reported-by: boris64 <bugzilla.kernel.org@boris64.net> Cc: <stable@kernel.org> v2.6.29..2.6.35 LKML-Reference: <tip-@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-28 16:33:14 -07:00
Linus Torvalds	050026feae	Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile	2010-09-27 21:19:27 -07:00
Linus Torvalds	6a6aa2b7e4	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Fix rounding-bug in __unmap_single x86/amd-iommu: Work around S3 BIOS bug x86/amd-iommu: Set iommu configuration flags in enable-loop x86, setup: Fix earlyprintk=serial,0x3f8,115200 x86, setup: Fix earlyprintk=serial,ttyS0,115200	2010-09-27 12:22:21 -07:00
Linus Torvalds	f0619343ce	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Catch spurious interrupts after disabling counters tracing/x86: Don't use mcount in kvmclock.c tracing/x86: Don't use mcount in pvclock.c	2010-09-27 12:21:48 -07:00
Ingo Molnar	c7a27aa465	Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent	2010-09-27 09:48:44 +02:00
Alexander Chumachenko	c9e2fbd909	x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile While debugging bit_spin_lock() hang, it was tracked down to gcc-4.4 misoptimization of non-inlined constant_test_bit() due to non-volatile addr when 'const volatile unsigned long addr' cast to 'unsigned long ' with subsequent unconditional jump to pause (and not to the test) leading to hang. Compiling with gcc-4.3 or disabling CONFIG_OPTIMIZE_INLINING yields inlined constant_test_bit() and correct jump, thus working around the kernel bug. Other arches than asm-x86 may implement this slightly differently; 2.6.29 mitigates the misoptimization by changing the function prototype (commit `c4295fbb60`) but probably fixing the issue itself is better. Signed-off-by: Alexander Chumachenko <ledest@gmail.com> Signed-off-by: Michael Shigorin <mike@osdn.org.ua> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-09-26 22:43:07 -07:00
Ma Ling	3b4b682bec	x86, mem: Optimize memmove for small size and unaligned cases movs instruction will combine data to accelerate moving data, however we need to concern two cases about it. 1. movs instruction need long lantency to startup, so here we use general mov instruction to copy data. 2. movs instruction is not good for unaligned case, even if src offset is 0x10, dest offset is 0x0, we avoid and handle the case by general mov instruction. Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1284664360-6138-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-09-24 18:57:11 -07:00
Jan Beulich	a46590533a	x86/hwmon: fix initialization of coretemp Using cpuid_eax() to determine feature availability on other than the current CPU is invalid. And feature availability should also be checked in the hotplug code path. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Rudolf Marek <r.marek@assembler.cz> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Guenter Roeck <guenter.roeck@ericsson.com>	2010-09-24 11:44:19 -07:00
Robert Richter	63e6be6d98	perf, x86: Catch spurious interrupts after disabling counters Some cpus still deliver spurious interrupts after disabling a counter. This caused 'undelivered NMI' messages. This patch fixes this. Introduced by: `4177c42`: perf, x86: Try to handle unknown nmis with an enabled PMU Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: Don Zickus <dzickus@redhat.com> Cc: gorcunov@gmail.com <gorcunov@gmail.com> Cc: fweisbec@gmail.com <fweisbec@gmail.com> Cc: ying.huang@intel.com <ying.huang@intel.com> Cc: ming.m.lin@intel.com <ming.m.lin@intel.com> Cc: yinghai@kernel.org <yinghai@kernel.org> Cc: andi@firstfloor.org <andi@firstfloor.org> Cc: eranian@google.com <eranian@google.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100915162034.GO13563@erda.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-24 12:21:41 +02:00
Ingo Molnar	7329cf0201	Merge branch 'amd-iommu/2.6.36' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2010-09-24 11:19:53 +02:00
Ingo Molnar	a5a2bad55d	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2010-09-24 09:12:05 +02:00
Daniel Drake	3e3c486012	x86, olpc: Rework BIOS signature check The XO-1.5 laptop is not currently detected as an OLPC machine because it fails this XO-1-centric check. Now that we have OLPC OFW support in the kernel, a more sensible check is to see if we found OFW during boot and check the architecture property. Also remove a now-meaningless codepath, as we're always going to have OFW support with OLPC. Signed-off-by: Daniel Drake <dsd@laptop.org> LKML-Reference: <20100923162846.D8D409D401B@zog.reactivated.net> Cc: Andres Salomon <dilinger@queued.net> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-23 11:15:00 -07:00
Daniel Drake	76fb657017	x86, olpc: Only enable PCI configuration type override on XO-1 This configuration type override is for XO-1 only and must not happen on XO-1.5. Signed-off-by: Daniel Drake <dsd@laptop.org> LKML-Reference: <20100923162805.0F6549D401B@zog.reactivated.net> Cc: Andres Solomon <dilinger@queued.net> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-23 11:14:18 -07:00
Bart Oldeman	6554287b1d	x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers. Impact: fix kernel bug such as: BUG: scheduling while atomic: dosemu.bin/19680/0x00000004 See also Ubuntu bug 455067 at https://bugs.launchpad.net/ubuntu/+source/linux/+bug/455067 Commits `4915a35e35` ("Use preempt_conditional_sti/cli in do_int3, like on x86_64.") and `3d2a71a596` ("x86, traps: converge do_debug handlers") started disabling preemption in int1 and int3 handlers on i386. The problem with vm86 is that the call to handle_vm86_trap() may jump straight to entry_32.S and never returns so preempt is never enabled again, and there is an imbalance in the preempt count. Commit `be716615fe` ("x86, vm86: fix preemption bug"), which was later (accidentally?) reverted by commit `08d68323d1` ("hw-breakpoints: modifying generic debug exception to use thread-specific debug registers") fixed the problem for debug exceptions but not for breakpoints. There are three solutions to this problem. 1. Reenable preemption before calling handle_vm86_trap(). This was the approach that was later reverted. 2. Do not disable preemption for i386 in breakpoint and debug handlers. This was the situation before October 2008. As far as I understand preemption only needs to be disabled on x86_64 because a seperate stack is used, but it's nice to have things work the same way on i386 and x86_64. 3. Let handle_vm86_trap() return instead of jumping to assembly code. By setting a flag in _TIF_WORK_MASK, either TIF_IRET or TIF_NOTIFY_RESUME, the code in entry_32.S is instructed to return to 32 bit mode from V86 mode. The logic in entry_32.S was already present to handle signals. (I chose TIF_IRET because it's slightly more efficient in do_notify_resume() in signal.c, but in fact TIF_IRET can probably be replaced by TIF_NOTIFY_RESUME everywhere.) I'm submitting approach 3, because I believe it is the most elegant and prevents future confusion. Still, an obvious preempt_conditional_cli(regs); is necessary in traps.c to correct the bug. [ hpa: This is technically a regression, but because: 1. the regression is so old, 2. the patch seems relatively high risk, justifying more testing, and 3. we're late in the 2.6.36-rc cycle, I'm queuing it up for the 2.6.37 merge window. It might, however, justify as a -stable backport at a latter time, hence Cc: stable. ] Signed-off-by: Bart Oldeman <bartoldeman@users.sourceforge.net> LKML-Reference: <alpine.DEB.2.00.1009231312330.4732@localhost.localdomain> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-23 11:07:49 -07:00
Joerg Roedel	04e0463e08	x86/amd-iommu: Fix rounding-bug in __unmap_single In the __unmap_single function the dma_addr is rounded down to a page boundary before the dma pages are unmapped. The address is later also used to flush the TLB entries for that mapping. But without the offset into the dma page the amount of pages to flush might be miscalculated in the TLB flushing path. This patch fixes this bug by using the original address to flush the TLB. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-09-23 16:26:20 +02:00
Joerg Roedel	4c894f47bb	x86/amd-iommu: Work around S3 BIOS bug This patch adds a workaround for an IOMMU BIOS problem to the AMD IOMMU driver. The result of the bug is that the IOMMU does not execute commands anymore when the system comes out of the S3 state resulting in system failure. The bug in the BIOS is that is does not restore certain hardware specific registers correctly. This workaround reads out the contents of these registers at boot time and restores them on resume from S3. The workaround is limited to the specific IOMMU chipset where this problem occurs. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-09-23 16:26:03 +02:00
Joerg Roedel	e9bf519711	x86/amd-iommu: Set iommu configuration flags in enable-loop This patch moves the setting of the configuration and feature flags out out the acpi table parsing path and moves it into the iommu-enable path. This is needed to reliably fix resume-from-s3. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-09-23 16:24:50 +02:00
Steven Rostedt	46eb3b64dd	jump label/x86/sparc64: Remove !CC_OPTIMIZE_FOR_SIZE config conditions The !CC_OPTIMIZE_FOR_SIZE was added to enable the jump label functionality because Jason noticed that the gcc option would not optimize the labels and may even hurt performance. But this is a gcc problem not a kernel one. Removing this condition should add motivation to the gcc developers to actually fix it. Cc: Jason Baron <jbaron@redhat.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-22 23:10:23 -04:00
Steven Rostedt	258af47479	tracing/x86: Don't use mcount in kvmclock.c The guest can use the paravirt clock in kvmclock.c which is used by sched_clock(), which in turn is used by the tracing mechanism for timestamps, which leads to infinite recursion. Disable mcount/tracing for kvmclock.o. Cc: stable@kernel.org Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-22 23:01:19 -04:00
Jeremy Fitzhardinge	9ecd4e1689	tracing/x86: Don't use mcount in pvclock.c When using a paravirt clock, pvclock.c can be used by sched_clock(), which in turn is used by the tracing mechanism for timestamps, which leads to infinite recursion. Disable mcount/tracing for pvclock.o. Cc: stable@kernel.org Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4C9A9A3F.4040201@goop.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-22 23:00:50 -04:00
Jan Beulich	234bb549ee	x86, cleanups: Use clear_page/copy_page rather than memset/memcpy When operating on whole pages, use clear_page() and copy_page() in favor of memset() and memcpy(); after all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4C7FB8CA0200007800013F51@vpn.id2.novell.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-22 15:36:49 -07:00
Steven Rostedt	95fccd465e	jump label: Remove duplicate structure for x86 The structure in the x86 jump label code uses the typedef jump_label_t, which is defined by the #ifdef arch type. The structure does not need to be duplicated there. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-22 17:37:43 -04:00
Jason Baron	d9f5ab7b1c	jump label: x86 support add x86 support for jump label. I'm keeping this patch separate so its clear to arch maintainers what was required for x86 support this new feature. Hopefully, it wouldn't be too painful for other archs. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <f838f49f40fbea0254036194be66dc48b598dcea.1284733808.git.jbaron@redhat.com> [ cleaned up some formatting ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-22 16:33:03 -04:00
Jason Baron	4c3ef6d793	jump label: Add jump_label_text_reserved() to reserve jump points Add a jump_label_text_reserved(void start, void end), so that other pieces of code that want to modify kernel text, can first verify that jump label has not reserved the instruction. Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <06236663a3a7b1c1f13576bb9eccb6d9c17b7bfe.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-22 16:30:46 -04:00
Jason Baron	bf5438fca2	jump label: Base patch for jump label base patch to implement 'jump labeling'. Based on a new 'asm goto' inline assembly gcc mechanism, we can now branch to labels from an 'asm goto' statment. This allows us to create a 'no-op' fastpath, which can subsequently be patched with a jump to the slowpath code. This is useful for code which might be rarely used, but which we'd like to be able to call, if needed. Tracepoints are the current usecase that these are being implemented for. Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com> [ cleaned up some formating ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-22 16:29:41 -04:00
Ingo Molnar	90edf27fb8	Merge branch 'linus' into perf/core Conflicts: kernel/hw_breakpoint.c Merge reason: resolve the conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-22 18:45:01 +02:00
Linus Torvalds	87ac6fa26e	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hw breakpoints: Fix pid namespace bug x86: Fix instruction breakpoint encoding oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540) kprobes: Fix Kconfig dependency	2010-09-21 13:21:42 -07:00
Yinghai Lu	74b3c444a9	x86, setup: Fix earlyprintk=serial,0x3f8,115200 earlyprintk can take and I/O port, so we need to handle this case in the setup code too, otherwise 0x3f8 will be treated as a baud rate. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4C7B05A6.4010801@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-21 10:18:33 -07:00
Yinghai Lu	83d9f65bda	x86, setup: Fix earlyprintk=serial,ttyS0,115200 Torsten reported that there is garbage output, after commit `8fee13a48e` (x86, setup: enable early console output from the decompressor) It turns out we missed the offset for that case. Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4C7B0578.8090807@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-21 10:18:14 -07:00
Ingo Molnar	7ed569206e	Merge commit 'v2.6.36-rc5' into perf/core Merge reason: Pick up the latest fixes in -rc5. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-21 13:55:11 +02:00
Jiri Olsa	bb7ab785ad	oprofile: Add Support for Intel CPU Family 6 / Model 29 This patch adds CPU type detection for dunnington processor (Family 6 / Model 29) to be identified as core 2 family cpu type (wikipedia source). I tested oprofile on Intel(R) Xeon(R) CPU E7440 reporting itself as model 29, and it runs without an issue. Spec: http://www.intel.com/Assets/en_US/PDF/specupdate/320336.pdf Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Andi Kleen <ak@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-09-21 12:22:48 +02:00
Rusty Russell	9b6efcd2e2	lguest: update comments to reflect LHCALL_LOAD_GDT_ENTRY. We used to have a hypercall which reloaded the entire GDT, then we switched to one which loaded a single entry (to match the IDT code). Some comments were not updated, so fix them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reported by: Eviatar Khen <eviatarkhen@gmail.com>	2010-09-21 10:54:02 +09:30
H. Peter Anvin	c2b9ff24a0	x86, cpu: Re-run get_cpu_cap() after adjusting the CPUID level At least on Intel, adjusting the max CPUID level can expose new CPUID features, so we need to re-run get_cpu_cap() after changing the CPUID level. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-20 18:03:28 -07:00
H. Peter Anvin	ce5f68246b	x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line When we're using MWAIT for play_dead, explicitly CLFLUSH the cache line before executing MONITOR. This is a potential workaround for the Xeon 7400 erratum AAI65 after having a spurious wakeup and returning around the loop. "Potential" here because it is not certain that that erratum could actually trigger; however, the CLFLUSH should be harmless. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Venkatesh Pallipadi <venki@google.com> Cc: Asit Mallick <asit.k.mallick@intel.com> Cc: Arjan van de Ven <arjan@linux.kernel.org> Cc: Len Brown <lenb@kernel.org>	2010-09-20 15:51:59 -07:00
Jason Baron	fa6f2cc770	jump label: Make text_poke_early() globally visible Make text_poke_early available outside of alternative.c. The jump label patchset wants to make use of it in order to set up the optimal no-op sequences at run-time. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <04cfddf2ba77bcabfc3e524f1849d871d6a1cf9d.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-20 18:19:51 -04:00
Jason Baron	f49aa44856	jump label: Make dynamic no-op selection available outside of ftrace Move Steve's code for finding the best 5-byte no-op from ftrace.c to alternative.c. The idea is that other consumers (in this case jump label) want to make use of that code. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <96259ae74172dcac99c0020c249743c523a92e18.1284733808.git.jbaron@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-09-20 18:19:39 -04:00
Andreas Herrmann	23ac4ae827	x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB The file names are somehow misleading as the code is not specific to AMD K8 CPUs anymore. The files accomodate code for other AMD CPU northbridges as well. Same is true for the config option which is valid for AMD CPU northbridges in general and not specific to K8. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100917160343.GD4958@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-20 14:22:58 -07:00
Arnaud Lacombe	838a2e55e6	kbuild: migrate all arch to the kconfig mainmenu upgrade Signed-off-by: Arnaud Lacombe <lacombar@gmail.com> Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Michal Marek <mmarek@suse.cz>	2010-09-19 22:54:11 -04:00
Thomas Gleixner	995bd3bb5c	x86: Hpet: Avoid the comparator readback penalty Due to the overly intelligent design of HPETs, we need to workaround the problem that the compare value which we write is already behind the actual counter value at the point where the value hits the real compare register. This happens for two reasons: 1) We read out the counter, add the delta and write the result to the compare register. When a NMI or SMI hits between the read out and the write then the counter can be ahead of the event already 2) The write to the compare register is delayed by up to two HPET cycles in certain chipsets. We worked around this by reading back the compare register to make sure that the written value has hit the hardware. For certain ICH9+ chipsets this can require two readouts, as the first one can return the previous compare register value. That's bad performance wise for the normal case where the event is far enough in the future. As we already know that the write can be delayed by up to two cycles we can avoid the read back of the compare register completely if we make the decision whether the delta has elapsed already or not based on the following calculation: cmp = event - actual_count; If cmp is less than 8 HPET clock cycles, then we decide that the event has happened already and return -ETIME. That covers the above #1 and #2 problems which would cause a wait for HPET wraparound (~306 seconds). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Nix <nix@esperi.org.uk> Tested-by: Artur Skawina <art.08.09@gmail.com> Cc: Damien Wyart <damien.wyart@free.fr> Tested-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Tested-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <alpine.LFD.2.00.1009151500060.2416@localhost6.localdomain6>	2010-09-18 12:09:13 +02:00
H. Peter Anvin	a68e5c94f7	x86, hotplug: Move WBINVD back outside the play_dead loop On processors with hyperthreading, when only one thread is offlined the other thread can cause a spurious wakeup on the idled thread. We do not want to re-WBINVD when that happens. Ideally, we should simply skip WBINVD unless we're the last thread on a particular core to shut down, but there might be similar issues elsewhere in the system. Thus, revert to previous behavior of only WBINVD outside the loop. Partly as a result, remove the mb()'s around it: they are not necessary since wbinvd() is a serializing instruction, but they were intended to make sure the compiler didn't do any funny loop optimizations. Reported-by: Asit Mallick <asit.k.mallick@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Arjan van de Ven <arjan@linux.kernel.org> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.hl> LKML-Reference: <tip-ea53069231f9317062910d6e772cca4ce93de8c8@git.kernel.org>	2010-09-17 17:10:23 -07:00
H. Peter Anvin	ea53069231	x86, hotplug: Use mwait to offline a processor, fix the legacy case The code in native_play_dead() has a number of problems: 1. We should use MWAIT when available, to put ourselves into a deeper sleep state. 2. We use the existence of CLFLUSH to determine if WBINVD is safe, but that is totally bogus -- WBINVD is 486+, whereas CLFLUSH is a much later addition. 3. We should do WBINVD inside the loop, just in case of something like setting an A bit on page tables. Pointed out by Arjan van de Ven. This code is based in part of a previous patch by Venki Pallipadi, but unlike that patch this one keeps all the detection code local instead of pre-caching a bunch of information. We're shutting down the CPU; there is absolutely no hurry. This patch moves all the code to C and deletes the global wbinvd_halt() which is broken anyway. Originally-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.hl> LKML-Reference: <20090522232230.162239000@intel.com>	2010-09-17 15:39:11 -07:00
H. Peter Anvin	bc83cccc76	x86, mwait: Move mwait constants to a common header file We have MWAIT constants spread across three different .c files, for no good reason. Move them all into a common header file. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <tip-*@git.kernel.org>	2010-09-17 15:36:40 -07:00
Andreas Herrmann	900f9ac9f1	x86, k8-gart: Decouple handling of garts and northbridges So far we only provide num_k8_northbridges. This is required in different areas (e.g. L3 cache index disable, GART). But not all AMD CPUs provide a GART. Thus it is useful to split off the GART handling from the generic caching of AMD northbridge misc devices. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100917160254.GC4958@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-17 13:26:21 -07:00
Andreas Herrmann	3518dd14ca	x86, cacheinfo: Fix dependency of AMD L3 CID L3 cache index disable code uses PCI accesses to AMD northbridge functions. Currently the code is #ifdef CONFIG_CPU_SUP_AMD. But it should be #if (defined(CONFIG_CPU_SUP_AMD) && defined(CONFIG_PCI)) which in the end is a dependency to K8_NB. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100917160744.GF4958@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-17 13:25:56 -07:00
Cliff Wickman	3ee48b6af4	mm, x86: Saving vmcore with non-lazy freeing of vmas During the reading of /proc/vmcore the kernel is doing ioremap()/iounmap() repeatedly. And the buildup of un-flushed vm_area_struct's is causing a great deal of overhead. (rb_next() is chewing up most of that time). This solution is to provide function set_iounmap_nonlazy(). It causes a subsequent call to iounmap() to immediately purge the vma area (with try_purge_vmap_area_lazy()). With this patch we have seen the time for writing a 250MB compressed dump drop from 71 seconds to 44 seconds. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: kexec@lists.infradead.org Cc: <stable@kernel.org> LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-17 09:11:56 +02:00
Linus Torvalds	a5b617368c	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hpet: Work around hardware stupidity x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y x86, cpufeature: Suppress compiler warning with gcc 3.x x86, UV: Fix initialization of max_pnode	2010-09-16 19:38:08 -07:00
Frederic Weisbecker	89e45aac42	x86: Fix instruction breakpoint encoding Lengths and types of breakpoints are encoded in a half byte into CPU registers. However when we extract these values and store them, we add a high half byte part to them: 0x40 to the length and 0x80 to the type. When that gets reloaded to the CPU registers, the high part is masked. While making the instruction breakpoints available for perf, I zapped that high part on instruction breakpoint encoding and that broke the arch -> generic translation used by ptrace instruction breakpoints. Writing dr7 to set an inst breakpoint was then failing. There is no apparent reason for these high parts so we could get rid of them altogether. That's an invasive change though so let's do that later and for now fix the problem by restoring that inst breakpoint high part encoding in this sole patch. Reported-by: Kelvie Wong <kelvie@ieee.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Cc: Will Deacon <will.deacon@arm.com>	2010-09-17 03:24:13 +02:00
Patrick Simmons	c33f543d32	oprofile: Add Support for Intel CPU Family 6 / Model 22 (Intel Celeron 540) This patch adds CPU type detection for the Intel Celeron 540, which is part of the Core 2 family according to Wikipedia; the family and ID pair is absent from the Volume 3B table referenced in the source code comments. I have tested this patch on an Intel Celeron 540 machine reporting itself as Family 6 Model 22, and OProfile runs on the machine without issue. Spec: http://download.intel.com/design/mobile/SPECUPDT/317667.pdf Signed-off-by: Patrick Simmons <linuxrocks123@netscape.net> Acked-by: Andi Kleen <ak@linux.intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-09-16 12:35:56 +02:00
Suresh Siddha	fa47f7e528	x86, x2apic: Simplify apic init in SMP and UP builds Move enable_IR_x2apic() inside the default_setup_apic_routing(), and for SMP platforms, move the default_setup_apic_routing() after smp_sanity_check(). This cleans up the code that tries to avoid multiple calls to default_setup_apic_routing() when smp_sanity_check() fails (which goes through the APIC_init_uniprocessor() path). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100827181049.173087246@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-15 17:37:10 -07:00
Suresh Siddha	62a92f4c69	x86, intr-remap: Remove IRTE setup duplicate code Remove IRTE setup duplicate code with prepare_irte(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100827181049.095067319@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-15 17:36:45 -07:00
Suresh Siddha	75e3cfbed6	x86, intr-remap: Set redirection hint in the IRTE Currently the redirection hint in the interrupt-remapping table entry is set to 0, which means the remapped interrupt is directed to the processors listed in the destination. So in logical flat mode in the presence of intr-remapping, this results in a single interrupt multi-casted to multiple cpu's as specified by the destination bit mask. But what we really want is to send that interrupt to one of the cpus based on the lowest priority delivery mode. Set the redirection hint in the IRTE to '1' to indicate that we want the remapped interrupt to be directed to only one of the processors listed in the destination. This fixes the issue of same interrupt getting delivered to multiple cpu's in the logical flat mode in the presence of interrupt-remapping. While there is no functional issue observed with this behavior, this will impact performance of such configurations (<=8 cpu's using logical flat mode in the presence of interrupt-remapping) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100827181049.013051492@sbsiddha-MOBL3.sc.intel.com> Cc: Weidong Han <weidong.han@intel.com> Cc: <stable@kernel.org> # [v2.6.32+] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-15 17:36:37 -07:00
Udo van den Heuvel	892df7f81c	x86: HPET force enable for CX700 / VIA Epia LT Allow using HPET with the hpet=force command line option on VIA EPIA CX700 systems. Signed-off-by: Udo van den Heuvel <udovdh@xs4all.nl> Cc: Robert Hancock <hancockrwd@gmail.com> LKML-Reference: <4C8F04DC.5060303@xs4all.nl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-09-15 16:27:04 +02:00
Namhyung Kim	6abded71d7	kprobes: Remove __dummy_buf Remove __dummy_buf which is needed for kallsyms_lookup only. use kallsysm_lookup_size_offset instead. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> LKML-Reference: <1284512670-2369-5-git-send-email-namhyung@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-15 10:44:02 +02:00
Namhyung Kim	6376b22975	kprobes: Make functions static Make following (internal) functions static to make sparse happier :-) * get_optimized_kprobe: only called from static functions * kretprobe_table_unlock: _lock function is static * kprobes_optinsn_template_holder: never called but holding asm code Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> LKML-Reference: <1284512670-2369-4-git-send-email-namhyung@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-15 10:44:01 +02:00
Ingo Molnar	3aabae7d9d	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2010-09-15 10:27:31 +02:00
Roland McGrath	eefdca043e	x86-64, compat: Retruncate rax after ia32 syscall entry tracing In commit `d4d6715`, we reopened an old hole for a 64-bit ptracer touching a 32-bit tracee in system call entry. A %rax value set via ptrace at the entry tracing stop gets used whole as a 32-bit syscall number, while we only check the low 32 bits for validity. Fix it by truncating %rax back to 32 bits after syscall_trace_enter, in addition to testing the full 64 bits as has already been added. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-14 16:08:47 -07:00
H. Peter Anvin	36d001c70d	x86-64, compat: Test %rax for the syscall number, not %eax On 64 bits, we always, by necessity, jump through the system call table via %rax. For 32-bit system calls, in theory the system call number is stored in %eax, and the code was testing %eax for a valid system call number. At one point we loaded the stored value back from the stack to enforce zero-extension, but that was removed in checkin `d4d6715016`. An actual 32-bit process will not be able to introduce a non-zero-extended number, but it can happen via ptrace. Instead of re-introducing the zero-extension, test what we are actually going to use, i.e. %rax. This only adds a handful of REX prefixes to the code. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org> Cc: Roland McGrath <roland@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>	2010-09-14 16:08:46 -07:00
H. Peter Anvin	c41d68a513	compat: Make compat_alloc_user_space() incorporate the access_ok() compat_alloc_user_space() expects the caller to independently call access_ok() to verify the returned area. A missing call could introduce problems on some architectures. This patch incorporates the access_ok() check into compat_alloc_user_space() and also adds a sanity check on the length. The existing compat_alloc_user_space() implementations are renamed arch_compat_alloc_user_space() and are used as part of the implementation of the new global function. This patch assumes NULL will cause __get_user()/__put_user() to either fail or access userspace on all architectures. This should be followed by checking the return value of compat_access_user_space() for NULL in the callers, at which time the access_ok() in the callers can also be removed. Reported-by: Ben Hawkes <hawkes@sota.gen.nz> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Tony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@kernel.org>	2010-09-14 16:08:45 -07:00
Thomas Gleixner	54ff7e595d	x86: hpet: Work around hardware stupidity This more or less reverts commits `08be979` (x86: Force HPET readback_cmp for all ATI chipsets) and `30a564be` (x86, hpet: Restrict read back to affected ATI chipsets) to the status of commit `8da854c` (x86, hpet: Erratum workaround for read after write of HPET comparator). The delta to commit `8da854c` is mostly comments and the change from WARN_ONCE to printk_once as we know the call path of this function already. This needs really in depth explanation: First of all the HPET design is a complete failure. Having a counter compare register which generates an interrupt on matching values forces the software to do at least one superfluous readback of the counter register. While it is nice in theory to program "absolute" time events it is practically useless because the timer runs at some absurd frequency which can never be matched to real world units. So we are forced to calculate a relative delta and this forces a readout of the actual counter value, adding the delta and programming the compare register. When the delta is small enough we run into the danger that we program a compare value which is already in the past. Due to the compare for equal nature of HPET we need to read back the counter value after writing the compare rehgister (btw. this is necessary for absolute timeouts as well) to make sure that we did not miss the timer event. We try to work around that by setting the minimum delta to a value which is larger than the theoretical time which elapses between the counter readout and the compare register write, but that's only true in theory. A NMI or SMI which hits between the readout and the write can easily push us beyond that limit. This would result in waiting for the next HPET timer interrupt until the 32bit wraparound of the counter happens which takes about 306 seconds. So we designed the next event function to look like: match = read_cnt() + delta; write_compare_ref(match); return read_cnt() < match ? 0 : -ETIME; At some point we got into trouble with certain ATI chipsets. Even the above "safe" procedure failed. The reason was that the write to the compare register was delayed probably for performance reasons. The theory was that they wanted to avoid the synchronization of the write with the HPET clock, which is understandable. So the write does not hit the compare register directly instead it goes to some intermediate register which is copied to the real compare register in sync with the HPET clock. That opens another window for hitting the dreaded "wait for a wraparound" problem. To work around that "optimization" we added a read back of the compare register which either enforced the update of the just written value or just delayed the readout of the counter enough to avoid the issue. We unfortunately never got any affirmative info from ATI/AMD about this. One thing is sure, that we nuked the performance "optimization" that way completely and I'm pretty sure that the result is worse than before some HW folks came up with those. Just for paranoia reasons I added a check whether the read back compare register value was the same as the value we wrote right before. That paranoia check triggered a couple of years after it was added on an Intel ICH9 chipset. Venki added a workaround (commit `8da854c`) which was reading the compare register twice when the first check failed. We considered this to be a penalty in general and restricted the readback (thus the wasted CPU cycles) to the known to be affected ATI chipsets. This turned out to be a utterly wrong decision. 2.6.35 testers experienced massive problems and finally one of them bisected it down to commit `30a564be` which spured some further investigation. Finally we got confirmation that the write to the compare register can be delayed by up to two HPET clock cycles which explains the problems nicely. All we can do about this is to go back to Venki's initial workaround in a slightly modified version. Just for the record I need to say, that all of this could have been avoided if hardware designers and of course the HPET committee would have thought about the consequences for a split second. It's out of my comprehension why designing a working timer is so hard. There are two ways to achieve it: 1) Use a counter wrap around aware compare_reg <= counter_reg implementation instead of the easy compare_reg == counter_reg Downsides: - It needs more silicon. - It needs a readout of the counter to apply a relative timeout. This is necessary as the counter does not run in any useful (and adjustable) frequency and there is no guarantee that the counter which is used for timer events is the same which is used for reading the actual time (and therefor for calculating the delta) Upsides: - None 2) Use a simple down counter for relative timer events Downsides: - Absolute timeouts are not possible, which is not a problem at all in the context of an OS and the expected max. latencies/jitter (also see Downsides of #1) Upsides: - It needs less or equal silicon. - It works ALWAYS - It is way faster than a compare register based solution (One write versus one write plus at least one and up to four reads) I would not be so grumpy about all of this, if I would not have been ignored for many years when pointing out these flaws to various hardware folks. I really hate timers (at least those which seem to be designed by janitors). Though finally we got a reasonable explanation plus a solution and I want to thank all the folks involved in chasing it down and providing valuable input to this. Bisected-by: Nix <nix@esperi.org.uk> Reported-by: Artur Skawina <art.08.09@gmail.com> Reported-by: Damien Wyart <damien.wyart@free.fr> Reported-by: John Drescher <drescherjm@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: stable@kernel.org Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-09-15 00:55:13 +02:00
basile@opensource.dyc.edu	08c2b394b9	x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y The arch/x86/Makefile uses scripts/gcc-x86_$(BITS)-has-stack-protector.sh to check if cc1 supports -fstack-protector. When -fPIE is passed to cc1, these scripts fail causing stack protection to be disabled even when it is available. This fix is similar to commit `c47efe5548` Reported-by: Kai Dietrich <mail@cleeus.de> Signed-off-by: Magnus Granberg <zorry@gentoo.org> LKML-Reference: <20100913101319.748A1148E216@opensource.dyc.edu> Signed-off-by: Anthony G. Basile <basile@opensource.dyc.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-13 15:53:16 -07:00
Tetsuo Handa	2fd818642a	x86, cpufeature: Suppress compiler warning with gcc 3.x Gcc 3.x generates a warning arch/x86/include/asm/cpufeature.h: In function `__static_cpu_has': arch/x86/include/asm/cpufeature.h:326: warning: asm operand 1 probably doesn't match constraints on each file. But static_cpu_has() for gcc 3.x does not need __static_cpu_has(). Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> LKML-Reference: <201008300127.o7U1RC6Z044051@www262.sakura.ne.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-13 14:48:41 -07:00
Stephane Eranian	b0b2072df3	perf_events: Fix BTS interrupt handling to avoid being dazed by NMI (v2) Fix a bug introduced with commit `de725de` and the change in the meaning of the return value of intel_pmu_handle_irq(). With the current code, when you are using the BTS, you get 'dazed by NMI' each time the BTS buffer fills up. BTS does interrupt on the PMU vector, thus NMI. You need to take this into account in the return value of the function. This version fixes initial patch which was missing changes to perf_event_intel_ds.c. Signed-off-by: Stephane Eranian <eranian@google.com> Acked-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: paulus@samba.org Cc: davem@davemloft.net Cc: fweisbec@gmail.com Cc: perfmon2-devel@lists.sf.net Cc: eranian@gmail.com Cc: robert.richter@amd.com LKML-Reference: <4c8a1686.aae9d80a.5aa4.5e35@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-13 08:43:40 +02:00
Joe Perches	d0ed0c3266	x86: Remove pr_<level> uses of KERN_<level> Signed-off-by: Joe Perches <joe@perches.com> Cc: Jiri Kosina <trivial@kernel.org> LKML-Reference: <d40c60f4b036a26db8492848695bdafaa3b42791.1284267142.git.joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-12 09:32:31 +02:00
Peter Zijlstra	5ee5e97ee9	x86, tsc: Fix a preemption leak in restore_sched_clock_state() A real life genuine preemption leak.. Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-09-10 18:17:45 -07:00
Venkatesh Pallipadi	351e5a703a	x86, mtrr: Support mtrr lookup for range spanning across MTRR range mtrr_type_lookup [start:end] looked up the resultant MTRR type for that range, based on fixed and all variable MTRR ranges. It did check for multiple MTRR var ranges overlapping [start:end] and returned the net type. However, if the [start:end] range spanned across any var MTRR range, mtrr_type_lookup would return an error return of 0xFE. This was based on typical usage of mtrr_type_lookup in PAT mapping, where region being mapped would not normally span across MTRR ranges and also trying to keep the code simple. Mark recently reported the problem with this limitation. When there are two continguous MTRR's of type "writeback" and if there is a memory mapping over a region starting in one MTRR range and ending in another MTRR range, such mapping will fallback to "uncached" due to the above limitation. Change below adds support for such lookups spanning multiple MTRR ranges. We now have a wrapper mtrr_type_lookup that dynamically splits such a region into smaller chunks that fit within one MTRR range and does a __mtrr_type_lookup on it and combine the results later. Reported-by: Mark Langsdorf <mark.langsdorf@amd.com> Signed-off-by: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1284159350-19841-3-git-send-email-venki@google.com> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-10 16:11:20 -07:00
Venkatesh Pallipadi	a7f07cfbaa	x86, mtrr: Refactor MTRR type overlap check code Move the MTRR type overlap check into a new function. No functional change in this patch. Just making it easier to add multiple region overlap check in the following patch. Signed-off-by: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1284159350-19841-2-git-send-email-venki@google.com> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-10 16:11:10 -07:00
Jack Steiner	36ac4b987b	x86, UV: Fix initialization of max_pnode Fix calculation of "max_pnode" for systems where the the highest blade has neither cpus or memory. (And, yes, although rare this does occur). Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20100910150808.GA19802@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-10 17:15:49 +02:00
Linus Torvalds	be6200aac9	Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Perform hardware_enable in CPU_STARTING callback KVM: i8259: fix migration KVM: fix i8259 oops when no vcpus are online KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts	2010-09-10 08:02:45 -07:00
Brian Gerst	db7829c6cc	x86, percpu: Optimize this_cpu_ptr Allow arches to implement __this_cpu_ptr, and provide an x86 version. Before: movq $foo, %rax movq %gs:this_cpu_off, %rdx addq %rdx, %rax After: movq $foo, %rax addq %gs:this_cpu_off, %rax The benefit is doing it in one less instruction and not clobbering a temporary register. tj: * Beefed up the comment a bit and renamed in-macro temp variable to match neighboring macros. * Folded fix for const pointer case found in linux-next. * Fixed sparse notation. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-09-10 10:56:47 +02:00
Brian Gerst	b2b57fe053	x86, fpu: Merge fpu_save_init() Make 64-bit use the 32-bit version of fpu_save_init(). Remove unused clear_fpu_state(). Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-13-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:17:36 -07:00
Brian Gerst	58a992b9cb	x86-32, fpu: Rewrite fpu_save_init() Rewrite fpu_save_init() to prepare for merging with 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-12-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:17:31 -07:00
Brian Gerst	eec73f813a	x86, fpu: Remove PSHUFB_XMM5_* macros The PSHUFB_XMM5_* macros are no longer used. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-11-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:17:25 -07:00
Brian Gerst	8eb91a577d	x86, fpu: Remove unnecessary ifdefs from i387 code. Remove ifdefs for code that the compiler can optimize away on 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-10-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:17:18 -07:00
Brian Gerst	a334fe43d8	x86-32, fpu: Remove math_emulate stub check_fpu() in bugs.c halts boot if no FPU is found and math emulation isn't enabled. Therefore this stub will never be used. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-9-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:17:11 -07:00
Brian Gerst	820241356d	x86-64, fpu: Simplify constraints for fxsave/fxtstor Use the "R" constraint (legacy register) instead of listing all the possible registers. Clean up the comments as well. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-8-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:17:06 -07:00
Brian Gerst	10c11f3049	x86-64, fpu: Fix %cs value in convert_from_fxsr() While %ds still contains the userspace selector, %cs is KERNEL_CS at this point. Always get %cs from pt_regs even for the current task. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-7-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:16:58 -07:00
Brian Gerst	a4d4fbc773	x86-64, fpu: Disable preemption when using TS_USEDFPU Consolidates code and fixes the below race for 64-bit. commit 9fa2f37bfeb798728241cc4a19578ce6e4258f25 Author: torvalds <torvalds> Date: Tue Sep 2 07:37:25 2003 +0000 Be a lot more careful about TS_USEDFPU and preemption We had some races where we testecd (or set) TS_USEDFPU together with sequences that depended on the setting (like clearing or setting the TS flag in %cr0) and we could be preempted in between, which screws up the FPU state, since preemption will itself change USEDFPU and the TS flag. This makes it a lot more explicit: the "internal" low-level FPU functions ("__xxxx_fpu()") all require preemption to be disabled, and the exported "real" functions will make sure that is the case. One case - in __switch_to() - was switched to the non-preempt-safe internal version, since the scheduler itself has already disabled preemption. BKrev: 3f5448b5WRiQuyzAlbajs3qoQjSobw Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-6-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:16:45 -07:00
Brian Gerst	bfd946cb89	x86, fpu: Merge __save_init_fpu() __save_init_fpu() is identical for 32-bit and 64-bit. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-5-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:16:30 -07:00
Brian Gerst	51115d4d45	x86, fpu: Merge tolerant_fwait() Commit `e2e75c91` merged the math exception handler, allowing both 32-bit and 64-bit to handle math exceptions from kernel mode. Switch to using the 64-bit version of tolerant_fwait() without fnclex, which simply ignores the exception if one is still pending from userspace. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-4-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:16:25 -07:00
Brian Gerst	6ac8bac268	x86, fpu: Merge fpu_init() Make fpu_init() handle 32-bit setup. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-3-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:16:20 -07:00
Brian Gerst	2df7a6e9e8	x86: Use correct type for %cr4 %cr4 is 64-bit in 64-bit mode (although the upper 32-bits are currently reserved). Use unsigned long for the temporary variable to get the right size. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1283563039-3466-2-git-send-email-brgerst@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-09 14:16:13 -07:00
Peter Zijlstra	15ac9a395a	perf: Remove the sysfs bits Neither the overcommit nor the reservation sysfs parameter were actually working, remove them as they'll only get in the way. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 20:46:31 +02:00
Peter Zijlstra	a4eaf7f146	perf: Rework the PMU methods Replace pmu::{enable,disable,start,stop,unthrottle} with pmu::{add,del,start,stop}, all of which take a flags argument. The new interface extends the capability to stop a counter while keeping it scheduled on the PMU. We replace the throttled state with the generic stopped state. This also allows us to efficiently stop/start counters over certain code paths (like IRQ handlers). It also allows scheduling a counter without it starting, allowing for a generic frozen state (useful for rotating stopped counters). The stopped state is implemented in two different ways, depending on how the architecture implemented the throttled state: 1) We disable the counter: a) the pmu has per-counter enable bits, we flip that b) we program a NOP event, preserving the counter state 2) We store the counter state and ignore all read/overflow events Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 20:46:30 +02:00
Peter Zijlstra	33696fc0d1	perf: Per PMU disable Changes perf_disable() into perf_pmu_disable(). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 20:46:29 +02:00
Peter Zijlstra	24cd7f54a0	perf: Reduce perf_disable() usage Since the current perf_disable() usage is only an optimization, remove it for now. This eases the removal of the __weak hw_perf_enable() interface. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 20:46:29 +02:00
Peter Zijlstra	b0a873ebbf	perf: Register PMU implementations Simple registration interface for struct pmu, this provides the infrastructure for removing all the weak functions. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 20:46:28 +02:00
Peter Zijlstra	51b0fe3954	perf: Deconstify struct pmu sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"` Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> Cc: stephane eranian <eranian@googlemail.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Lin Ming <ming.m.lin@intel.com> Cc: Yanmin <yanmin_zhang@linux.intel.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 20:46:27 +02:00
Ingo Molnar	2aa61274ef	Merge branch 'perf/urgent' into perf/core Merge reason: Pick up pending fixes before applying dependent new changes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 20:40:08 +02:00
Cliff Wickman	37a2f9f30a	x86, kdump: Change copy_oldmem_page() to use cached addressing The copy of /proc/vmcore to a user buffer proceeds much faster if the kernel addresses memory as cached. With this patch we have seen an increase in transfer rate from less than 15MB/s to 80-460MB/s, depending on size of the transfer. This makes a big difference in time needed to save a system dump. Signed-off-by: Cliff Wickman <cpw@sgi.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: kexec@lists.infradead.org Cc: <stable@kernel.org> # as far back as it would apply LKML-Reference: <E1OtMLz-0001yp-Ia@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-09 09:46:23 +02:00
Andre Przywara	aeb9c7d618	x86, kvm: add new AMD SVM feature bits The recently updated CPUID specification names new SVM feature bits. Add them to the list of reported features. Signed-off-by: Andre Przywara <andre.przywara@amd,com> LKML-Reference: <1283778860-26843-5-git-send-email-andre.przywara@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-08 13:34:15 -07:00
Andre Przywara	6d886fd042	x86, cpu: Fix allowed CPUID bits for KVM guests The AMD extensions to AVX (FMA4, XOP) work on the same YMM register set as AVX, so they are safe for guests to use, as long as AVX itself is allowed. Add F16C and AES on the way for the same reasons. Signed-off-by: Andre Przywara <andre.przywara@amd.com> LKML-Reference: <1283778860-26843-4-git-send-email-andre.przywara@amd.com> Acked-by: Avi Kivity <avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-08 13:34:15 -07:00
Andre Przywara	33ed82fb6c	x86, cpu: Update AMD CPUID feature bits AMD's public CPUID specification has been updated and some bits have got names. Add them to properly describe new CPU features. Signed-off-by: Andre Przywara <andre.przywara@amd.com> LKML-Reference: <1283778860-26843-3-git-send-email-andre.przywara@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-08 13:34:15 -07:00
Andre Przywara	7ef8aa72ab	x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit The AMD SSE5 feature set as-it has been replaced by some extensions to the AVX instruction set. Thus the bit formerly advertised as SSE5 is re-used for one of these extensions (XOP). Although this changes the /proc/cpuinfo output, it is not user visible, as there are no CPUs (yet) having this feature. To avoid confusion this should be added to the stable series, too. Cc: stable@kernel.org [.32.x .34.x, .35.x] Signed-off-by: Andre Przywara <andre.przywara@amd.com> LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-08 13:32:55 -07:00
Linus Torvalds	1faa6ec8cc	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks io-mapping: Fix the address space annotations x86: Fix the address space annotations of iomap_atomic_prot_pfn() x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev	2010-09-08 11:14:10 -07:00
Linus Torvalds	899edae615	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Try to handle unknown nmis with an enabled PMU perf, x86: Fix handle_irq return values perf, x86: Fix accidentally ack'ing a second event on intel perf counter oprofile, x86: fix init_sysfs() function stub lockup_detector: Sync touch_*_watchdog back to old semantics tracing: Fix a race in function profile oprofile, x86: fix init_sysfs error handling perf_events: Fix time tracking for events with pid != -1 and cpu != -1 perf: Initialize callchains roots's childen hits oprofile: fix crash when accessing freed task structs	2010-09-08 11:13:16 -07:00
Gleb Natapov	eebb5f31b8	KVM: i8259: fix migration Top of kvm_kpic_state structure should have the same memory layout as kvm_pic_state since it is copied by memcpy. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-09-08 14:50:58 -03:00
Avi Kivity	ae0635b358	KVM: fix i8259 oops when no vcpus are online If there are no vcpus, found will be NULL. Check before doing anything with it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-09-08 14:50:56 -03:00
Avi Kivity	16518d5ada	KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b operands are 64-bit. Fix by adding val64 and orig_val64 union members to struct operand, and using them where needed. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-09-08 14:50:55 -03:00
Christian Dietrich	0f1cf415f0	x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI The ACPI/X86_IO_ACPI ifdef isn't necessary at this point, because it is checked in an outer ifdef level already and has no effect here. Cleanup only, no functional effect. Signed-off-by: Christian Dietrich <qy03fugy@stud.informatik.uni-erlangen.de> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: vamos-dev@i4.informatik.uni-erlangen.de LKML-Reference: <d4376e6d79b8dc0f89a4b3ce4a880904a7b93ead.1283782701.git.qy03fugy@stud.informatik.uni-erlangen.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-08 08:14:02 +02:00
Linus Torvalds	d56557af19	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: bus speed strings should be const PCI hotplug: Fix build with CONFIG_ACPI unset PCI: PCIe: Remove the port driver module exit routine PCI: PCIe: Move PCIe PME code to the pcie directory PCI: PCIe: Disable PCIe port services during port initialization PCI: PCIe: Ask BIOS for control of all native services at once ACPI/PCI: Negotiate _OSC control bits before requesting them ACPI/PCI: Do not preserve _OSC control bits returned by a query ACPI/PCI: Make acpi_pci_query_osc() return control bits ACPI/PCI: Reorder checks in acpi_pci_osc_control_set() PCI: PCIe: Introduce commad line switch for disabling port services PCI: PCIe AER: Introduce pci_aer_available() x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs	2010-09-07 16:00:17 -07:00
Alexander van Heukelum	fe8e0c25ca	x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE The irq stacks, located in the percpu-area, need to be THREAD_SIZE aligned. Add the infrastucture to align percpu variables to larger-than-pagesize amounts within the percpu area, and use it to specify the alignment for the irq stacks. Also align the percpu area itself to THREAD_SIZE. This should make irq stacks work with 8K THREAD_SIZE. Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Cc: Tejun Heo <tj@kernel.org> Cc: hch@lst.de LKML-Reference: <1283799222.15941.1393621887@webmail.messagingengine.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-07 05:07:00 +02:00
Jin Dongming	592091c0e2	therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal' In unexpected_thermal_interrupt(), "LVT TMR interrupt" is used in error message. I don't think TMR is a suitable abbreviation for thermal. 1.TMR has been used in IA32 Architectures Software Developer's Manual, and is the abbreviation for Trigger Mode Register. 2.There is not an standard abbreviation "TMR" defined for thermal in IA32 Architectures Software Developer's Manual. 3.Though we could understand it as Thermal Monitor Register, it is easy to be misunderstood as a TIMER interrupt also. I think this patch will fix it. Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Jean Delvare <khali@linux-fr.org> Cc: Brown Len <len.brown@intel.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Fenghua Yu <fenghua.yu@intel.com> LKML-Reference: <4C7C492D.5020704@np.css.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 20:26:50 +02:00
Andreas Herrmann	1389298f7d	x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks kobject_add_internal failed for threshold_bank2 with -EEXIST, don't try to register things with the same name in the same directory: Pid: 1, comm: swapper Tainted: G W 2.6.31 #1 Call Trace: [<ffffffff81161b07>] ? kobject_add_internal+0x156/0x180 [<ffffffff81161cc0>] ? kobject_add+0x66/0x6b [<ffffffff81161793>] ? kobject_init+0x42/0x82 [<ffffffff81161cf9>] ? kobject_create_and_add+0x34/0x63 [<ffffffff81393963>] ? threshold_create_bank+0x14f/0x259 [<ffffffff8139310a>] ? mce_create_device+0x8d/0x1b8 [<ffffffff81646497>] ? threshold_init_device+0x3f/0x80 [<ffffffff81646458>] ? threshold_init_device+0x0/0x80 [<ffffffff81009050>] ? do_one_initcall+0x4f/0x143 [<ffffffff816413a0>] ? kernel_init+0x14c/0x1a2 [<ffffffff8100c8da>] ? child_rip+0xa/0x20 [<ffffffff81641254>] ? kernel_init+0x0/0x1a2 [<ffffffff8100c8d0>] ? child_rip+0x0/0x20 kobject_create_and_add: kobject_add error: -17 (Probably the for_each_cpu loop should be entirely removed.) Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100827092006.GB5348@loge.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 14:35:49 +02:00
Andreas Herrmann	d9fadd7ba9	x86, AMD: Remove needless CPU family check (for L3 cache info) Old 32-bit AMD CPUs (all w/o L3 cache) should always return 0 for cpuid_edx(0x80000006). For unknown reason the 32-bit implementation differed from the 64-bit implementation. See commit `67cddd9479` ("i386: Add L3 cache support to AMD CPUID4 emulation"). The current check is the result of the x86 merge. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <20100902133710.GA5449@loge.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 14:33:48 +02:00
Borislav Petkov	260133ab65	x86, GART: Disable GART table walk probes Current code tramples over bit F3x90[6] which can be used to disable GART table walk probes. However, this bit should be set for performance reasons (speed up GART table walks). We are allowed to do that since we put GART tables in UC memory later anyway. Make it so. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1283531981-7495-3-git-send-email-bp@amd64.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 14:28:34 +02:00
Borislav Petkov	57ab43e331	x86, GART: Remove superfluous AMD64_GARTEN There is a GARTEN so use that and drop the duplicate. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1283531981-7495-2-git-send-email-bp@amd64.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 14:28:34 +02:00
Francisco Jerez	cc1a8e5233	x86: Fix the address space annotations of iomap_atomic_prot_pfn() This patch fixes the sparse warnings when the return pointer of iomap_atomic_prot_pfn() is used as an argument of iowrite32() and friends. Signed-off-by: Francisco Jerez <currojerez@riseup.net> LKML-Reference: <1283633804-11749-1-git-send-email-currojerez@riseup.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 14:26:14 +02:00
Wu Fengguang	1c5f50ee34	x86, mm: fix uninitialized addr in kernel_physical_mapping_init() This re-adds the lost chunk in commit `9b861528a8`. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haicheng Li <haicheng.li@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20100903090407.GA19771@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 11:40:11 +02:00
Jan Beulich	7fe977dab3	i386: Make kernel_execve() suitable for stack unwinding The explicit saving and restoring of %ebx was confusing stack unwind data consumers, and it is plain unnecessary to do this within the asm(), since that was only introduced for PIC user mode consumers of the original _syscall3() macro this was derived from. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Arnd Bergmann <arnd@arndb.de> LKML-Reference: <4C7FBC660200007800013F95@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:16:02 +02:00
Jan Beulich	df5d1874ce	x86: Use {push,pop}{l,q}_cfi in more places ... plus additionally introduce {push,pop}f{l,q}_cfi. All in the hope that the code becomes better readable this way (it gets quite a bit smaller in any case). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBDA40200007800013FAF@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:14:11 +02:00
Jan Beulich	a34107b557	i386: Add unwind directives to syscall ptregs stubs When these stubs are actual functions (i.e. having a return instruction) and have stack manipulation instructions in them, they should also be annotated to allow unwinding through them. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBCF00200007800013F99@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:14:10 +02:00
Jan Beulich	b1cccb1bb0	x86-64: Use symbolics instead of raw numbers in entry_64.S ... making the code a little less fragile. Also use pushq_cfi instead of raw CFI annotations in two more places, and add two missing annotations after stack pointer adjustments which got modified here anyway. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBACF0200007800013F6A@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:14:10 +02:00
Jan Beulich	1f130a783a	x86-64: Adjust frame type at paranoid_exit: As this isn't an exception or interrupt entry point, it doesn't have any of the hardware provide frame layouts active. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:14:10 +02:00
Jan Beulich	e6b04b6b5a	x86-64: Fix unwind annotations in syscall stubs With the return address removed from the stack, these should really refer to their caller's register state. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4C7FBA3D0200007800013F61@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:14:09 +02:00
Robert Richter	4177c42a63	perf, x86: Try to handle unknown nmis with an enabled PMU When the PMU is enabled it is valid to have unhandled nmis, two events could trigger 'simultaneously' raising two back-to-back NMIs. If the first NMI handles both, the latter will be empty and daze the CPU. The solution to avoid an 'unknown nmi' massage in this case was simply to stop the nmi handler chain when the PMU is enabled by stating the nmi was handled. This has the drawback that a) we can not detect unknown nmis anymore, and b) subsequent nmi handlers are not called. This patch addresses this. Now, we check this unknown NMI if it could be a PMU back-to-back NMI. Otherwise we pass it and let the kernel handle the unknown nmi. This is a debug log: cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430 cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616 cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320 cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139 cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100 cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607 cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416 cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032 cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830 cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743 cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552 cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224 cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677 cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772 cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818 cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591 Uhhuh. NMI received for unknown reason 00 on CPU 6. Do you have a strange power saving mode enabled? Dazed and confused, but trying to continue Deltas: nmi #32334 340186 nmi #32335 1327704 nmi #32336 1819 <<<< back-to-back nmi [1] nmi #32337 85961 nmi #32338 284507 nmi #32339 1578809 nmi #32340 217616 nmi #32341 1798 <<<< back-to-back nmi [2] nmi #32342 240913 nmi #32343 1512809 nmi #32344 116672 nmi #32345 412453 nmi #32346 1462095 <<<< 1st nmi (standard) handling 2 counters nmi #32347 2046 <<<< 2nd nmi (back-to-back) handling one counter nmi #32348 1773 <<<< 3rd nmi (back-to-back) handling no counter! [3] For back-to-back nmi detection there are the following rules: The PMU nmi handler was handling more than one counter and no counter was handled in the subsequent nmi (see [1] and [2] above). There is another case if there are two subsequent back-to-back nmis [3]. The 2nd is detected as back-to-back because the first handled more than one counter. If the second handles one counter and the 3rd handles nothing, we drop the 3rd nmi because it could be a back-to-back nmi. Signed-off-by: Robert Richter <robert.richter@amd.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ] Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:05:18 +02:00
Peter Zijlstra	de725dec9d	perf, x86: Fix handle_irq return values Now that we rely on the number of handled overflows, ensure all handle_irq implementations actually return the right number. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: robert.richter@amd.com Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:05:18 +02:00
Don Zickus	2e556b5b32	perf, x86: Fix accidentally ack'ing a second event on intel perf counter During testing of a patch to stop having the perf subsytem swallow nmis, it was uncovered that Nehalem boxes were randomly getting unknown nmis when using the perf tool. Moving the ack'ing of the PMI closer to when we get the status allows the hardware to properly re-set the PMU bit signaling another PMI was triggered during the processing of the first PMI. This allows the new logic for dealing with the shortcomings of multiple PMIs to handle the extra NMI by 'eat'ing it later. Now one can wonder why are we getting a second PMI when we disable all the PMUs in the begining of the NMI handler to prevent such a case, for that I do not know. But I know the fix below helps deal with this quirk. Tested on multiple Nehalems where the problem was occuring. With the patch, the code now loops a second time to handle the second PMI (whereas before it was not). Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: peterz@infradead.org Cc: robert.richter@amd.com Cc: gorcunov@gmail.com Cc: fweisbec@gmail.com Cc: ying.huang@intel.com Cc: ming.m.lin@intel.com Cc: eranian@google.com LKML-Reference: <1283454469-1909-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 08:05:17 +02:00
Ingo Molnar	b4c69d45c4	Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent	2010-09-01 22:31:07 +02:00
Robert Richter	269f45c250	oprofile, x86: fix init_sysfs() function stub The use of the return value of init_sysfs() with commit `10f0412` oprofile, x86: fix init_sysfs error handling discovered the following build error for !CONFIG_PM: .../linux/arch/x86/oprofile/nmi_int.c: In function ‘op_nmi_init’: .../linux/arch/x86/oprofile/nmi_int.c:784: error: expected expression before ‘do’ make[2]: * [arch/x86/oprofile/nmi_int.o] Error 1 make[1]: * [arch/x86/oprofile] Error 2 This patch fixes this. Reported-by: Ingo Molnar <mingo@elte.hu> Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-09-01 21:23:01 +02:00
Cyrill Gorcunov	c9cf4a019c	perf, x86, Pentium4: Add RAW events verification Implements verification of - Bits of ESCR EventMask field (meaningful bits in field are hardware predefined and others bits should be set to zero) - INSTR_COMPLETED event (it is available on predefined cpu model only) - Thread shared events (they should be guarded by "perf_event_paranoid" sysctl due to security reason). The side effect of this action is that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Tested-by: Lin Ming <ming.m.lin@intel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100825182334.GB14874@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-01 08:26:56 +02:00
Robert Richter	10f0412f57	oprofile, x86: fix init_sysfs error handling On failure init_sysfs() might not properly free resources. The error code of the function is not checked. And, when reinitializing the exit function might be called twice. This patch fixes all this. Cc: stable@kernel.org Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-08-31 10:26:26 +02:00
Ingo Molnar	daab7fc734	Merge commit 'v2.6.36-rc3' into x86/memblock Conflicts: arch/x86/kernel/trampoline.c mm/memblock.c Merge reason: Resolve the conflicts, update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-31 09:45:46 +02:00
Konrad Rzeszutek Wilk	7ac41ccf47	x86, iommu: Fix IOMMU_INIT alignment rules This boot crash was observed: DMA-API: preallocated 32768 debug entries DMA-API: debugging enabled by kernel config BUG: unable to handle kernel paging request at 19da8955 IP: [<f4ffffff>] 0xf4ffffff *pde = 00000000 The crux of the failure was that even if we did not use any of the .iommu_table section, the linker would still insert it in the vmlinux file. This patch fixes that and also fixes the runtime crash where we would try to access the array. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <1283191802-25086-1-git-send-email-konrad.wilk@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-31 08:06:10 +02:00
Julia Lawall	9fbaf49c7f	x86, kmemcheck: Remove double test The opcodes 0x2e and 0x3e are tested for in the first Group 2 line as well. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @expression@ expression E; @@ ( * E \|\| ... \|\| E \| * E && ... && E ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegardno@ifi.uio.no> LKML-Reference: <1283010066-20935-5-git-send-email-julia@diku.dk> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-30 09:19:28 +02:00
Konrad Rzeszutek Wilk	6f44d0337c	x86, doc: Adding comments about .iommu_table and its neighbors. Updating the linker section with comments about .iommu_table and some other ones that I know of. CC: Sam Ravnborg <sam@ravnborg.org> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282933173-19960-1-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 18:14:31 -07:00
Yinghai Lu	774ea0bcb2	x86: Remove old bootmem code Requested by Ingo, Thomas and HPA. The old bootmem code is no longer necessary, and the transition is complete. Remove it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:14:37 -07:00
Yinghai Lu	6f2a75369e	x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve memblock_memory_size() will return memory size in memblock.memory.region. memblock_free_memory_size() will return free memory size in memblock.memory.region. So We can get exact reseved size in specified range. Set the size right after initmem_init(), because later bootmem API will get area above 16M. (except some fallback). Later after we remove the bootmem, We could call that just before paging_init(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:13:54 -07:00
Yinghai Lu	a587d2daeb	x86: Remove not used early_res code and some functions in e820.c that are not used anymore Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:13:51 -07:00
Yinghai Lu	a9ce6bc151	x86, memblock: Replace e820_/_early string with memblock_ 1.include linux/memblock.h directly. so later could reduce e820.h reference. 2 this patch is done by sed scripts mainly -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:13:47 -07:00
Yinghai Lu	72d7c3b33c	x86: Use memblock to replace early_res 1. replace find_e820_area with memblock_find_in_range 2. replace reserve_early with memblock_x86_reserve_range 3. replace free_early with memblock_x86_free_range. 4. NO_BOOTMEM will switch to use memblock too. 5. use _e820, _early wrap in the patch, in following patch, will replace them all 6. because memblock_x86_free_range support partial free, we can remove some special care 7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill() so adjust some calling later in setup.c::setup_arch() -- corruption_check and mptable_update -v2: Move reserve_brk() early Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range() that could happen We have more then 128 RAM entry in E820 tables, and memblock_x86_fill() could use memblock_find_in_range() to find a new place for memblock.memory.region array. and We don't need to use extend_brk() after fill_memblock_area() So move reserve_brk() early before fill_memblock_area(). -v3: Move find_smp_config early To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable in right place. -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in memblock.reserved already.. use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later. -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit active_region for 32bit does include high pages need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped() -v6: Use current_limit instead -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L -v8: Set memblock_can_resize early to handle EFI with more RAM entries -v9: update after kmemleak changes in mainline Suggested-by: David S. Miller <davem@davemloft.net> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:12:29 -07:00
Yinghai Lu	301ff3e88e	x86, memblock: Use memblock_debug to control debug message print out Also let memblock_x86_reserve_range/memblock_x86_free_range could print out name if memblock=debug is specified will also print ther name when reserve_memblock_area/free_memblock_area are called. -v2: according to Ingo, put " if (memblock_debug) " in one place Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:11:35 -07:00
Yinghai Lu	e82d42be24	x86, memblock: Add memblock_x86_memory_in_range() It will return memory size in specified range according to memblock.memory.region Try to share some code with memblock_x86_free_memory_in_range() by passing get_free to __memblock_x86_memory_in_range(). -v2: Ben want _in_range in the name instead of size Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:11:30 -07:00
Yinghai Lu	b52c17ce85	x86, memblock: Add memblock_x86_free_memory_in_range() It will return free memory size in specified range. We can not use memory_size - reserved_size here, because some reserved area may not be in the scope of memblock.memory.region. Use memblock.memory.region subtracting memblock.reserved.region to get free range array. then count size of all free ranges. -v2: Ben insist on using _in_range Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:11:16 -07:00
Yinghai Lu	6bcc8176d0	x86, memblock: Add memblock_x86_find_in_range_node() It can be used to find NODE_DATA for numa. Need to make sure early_node_map[] is filled before it is called, otherwise it will fallback to memblock_find_in_range(), with node range. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:57 -07:00
Yinghai Lu	88ba088c18	x86, memblock: Add memblock_x86_register_active_regions() and memblock_x86_hole_size() memblock_x86_register_active_regions() will be used to fill early_node_map, the result will be memblock.memory.region AND numa data memblock_x86_hole_size will be used to find hole size on memblock.memory.region with specified range. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:52 -07:00
Yinghai Lu	4d5cf86ce1	x86, memblock: Add get_free_all_memory_range() get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by free_all_memory_core_early(). It will use early_node_map aka active ranges subtract memblock.reserved to get all free range, and those ranges will convert to slab pages. -v4: increase range size Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:48 -07:00
Yinghai Lu	9dc5d569c1	x86, memblock: Add memblock_x86_reserve_range/memblock_x86_free_range They are wrappers for core versions, which take start/end/name instead of base/size. This will make x86 conversion eaasier. could add more debug print out -v2: change get_max_mapped() to memblock.default_alloc_limit according to Michael Ellerman and Ben change to memblock_x86_reserve_range and memblock_x86_free_range according to Michael Ellerman -v3: call check_and_double after reserve/free, so could avoid to use find_memblock_area. Suggested by Michael Ellerman Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:41 -07:00
Yinghai Lu	27de794365	x86, memblock: Add memblock_x86_to_bootmem() memblock_x86_to_bootmem() will reserve memblock.reserved.region in bootmem after bootmem is set up. We can use it to with all arches that support memblock later. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:08:21 -07:00
Yinghai Lu	f88eff74aa	bootmem, x86: Add weak version of reserve_bootmem_generic It will be used memblock_x86_to_bootmem converting It is an wrapper for reserve_bootmem, and x86 64bit is using special one. Also clean up that version for x86_64. We don't need to take care of numa path for that, bootmem can handle it how Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:08:13 -07:00
Yinghai Lu	fb74fb6db9	x86, memblock: Add memblock_x86_find_in_range_size() size is returned according free range. Will be used to find free ranges for early_memtest and memory corruption check Do not mess it up with lib/memblock.c yet. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:08:06 -07:00
Frederic Weisbecker	98ee74a75c	Merge branch 'perf/urgent' into perf/core Conflicts: tools/perf/util/callchain.h Merge reason: Fix a non-trivial conflict with latest fixes	2010-08-27 02:30:07 +02:00
Shaohua Li	660a293ea9	x86, mm: Make spurious_fault check explicitly check the PRESENT bit pte_present() returns true even present bit isn't set but _PAGE_PROTNONE (global bit) bit is set. While with CONFIG_DEBUG_PAGEALLOC, free pages have global bit set but present bit clear. This patch makes we could catch free pages access with CONFIG_DEBUG_PAGEALLOC enabled. [ hpa: added a comment in the code as a warning to janitors ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1280217988.32400.75.camel@sli10-desk.sh.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 16:00:21 -07:00
Konrad Rzeszutek Wilk	ee1f284f38	x86, iommu: Utilize the IOMMU_INIT macros functionality. We remove all of the sub-platform detection/init routines and instead use on the .iommu_table array of structs to call the .early_init if .detect returned a positive value. Also we can stop detecting other IOMMUs if the IOMMU used the _FINISH type macro. During the 'pci_iommu_init' stage, we call .init for the second-stage initialization if it was defined. Currently only SWIOTLB has this defined and it used to de-allocate the SWIOTLB if the other detected IOMMUs have deemed it unnecessary to use SWIOTLB. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-11-git-send-email-konrad.wilk@oracle.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:14:52 -07:00
Konrad Rzeszutek Wilk	22e6daf41b	x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros. We utilize the IOMMU_INIT macros to create this dependency: [null] \| [pci_xen_swiotlb_detect] \| [pci_swiotlb_detect_override] \| [pci_swiotlb_detect_4gb] \| +-------+--------+ / \ [detect_calgary] [gart_iommu_hole_init] \| [amd_iommu_detect] Meaning that 'amd_iommu_detect' will be called after 'gart_iommu_hole_init'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-9-git-send-email-konrad.wilk@oracle.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Joerg Roedel <joerg.roedel@amd.com> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:14:30 -07:00
Konrad Rzeszutek Wilk	d2aa232f3d	x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros. We utilize the IOMMU_INIT macros to create this dependency: [pci_xen_swiotlb_detect] \| [pci_swiotlb_detect_override] \| [pci_swiotlb_detect_4gb] \| [detect_calgary] Meaning that 'detect_calgary' is going to be called after 'pci_swiotlb_detect'. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-8-git-send-email-konrad.wilk@oracle.com> CC: Muli Ben-Yehuda <muli@il.ibm.com> CC: "Jon D. Mason" <jdmason@kudzu.us> CC: "Darrick J. Wong" <djwong@us.ibm.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:14:15 -07:00
Konrad Rzeszutek Wilk	5cb3a26793	x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros. We utilize the IOMMU_INIT macros to create this dependency: [null] \| [pci_xen_swiotlb_detect] \| [pci_swiotlb_detect_override] \| [pci_swiotlb_detect_4gb] In other words, we set 'pci_xen_swiotlb_detect' to be the first detection to be run during start. CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-7-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:14:06 -07:00
Konrad Rzeszutek Wilk	c116c5457c	x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros. We utilize the IOMMU_INIT macros to create this dependency: [pci_xen_swiotlb_detect] \| [pci_swiotlb_detect_override] \| [pci_swiotlb_detect_4gb] And set the SWIOTLB IOMMU_INIT to utilize 'pci_swiotlb_init' for .init and 'pci_swiotlb_late_init' for .late_init. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-6-git-send-email-konrad.wilk@oracle.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:13:37 -07:00
Konrad Rzeszutek Wilk	efa631c26d	x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine. In 'pci_swiotlb_detect' we used to do two different things: a). If user provided 'iommu=soft' or 'swiotlb=force' we would set swiotlb=1 and return 1 (and forcing pci-dma.c to call pci_swiotlb_init() immediately). b). If 4GB or more would be detected and if user did not specify iommu=off, we would set 'swiotlb=1' and return whatever 'a)' figured out. We simplify this by splitting a) and b) in two different routines. CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-5-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:13:29 -07:00
Konrad Rzeszutek Wilk	5bef80a4b8	x86, iommu: Add proper dependency sort routine (and sanity check). We are using a very simple sort routine which sorts the .iommu_table array in the order of dependencies. Specifically each structure of iommu_table_entry has a field 'depend' which contains the function pointer to the IOMMU that MUST be run before us. We sort the array of structures so that the struct iommu_table_entry with no 'depend' field are first, and then the subsequent ones are the ones for which the 'depend' function has been already invoked (in other words, precede us). Using the kernel's version 'sort', which is a mergeheap is feasible, but would require making the comparison operator scan recursivly the array to satisfy the "heapify" process: setting the levels properly. The end result would much more complex than it should be an it is just much simpler to utilize this simple sort routine. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-4-git-send-email-konrad.wilk@oracle.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:13:19 -07:00
Konrad Rzeszutek Wilk	480125ba49	x86, iommu: Make all IOMMU's detection routines return a value. We return 1 if the IOMMU has been detected. Zero or an error number if we failed to find it. This is in preperation of using the IOMMU_INIT so that we can detect whether an IOMMU is present. I have not tested this for regression on Calgary, nor on AMD Vi chipsets as I don't have that hardware. CC: Muli Ben-Yehuda <muli@il.ibm.com> CC: "Jon D. Mason" <jdmason@kudzu.us> CC: "Darrick J. Wong" <djwong@us.ibm.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org> CC: David Woodhouse <David.Woodhouse@intel.com> CC: Chris Wright <chrisw@sous-sol.org> CC: Yinghai Lu <yinghai@kernel.org> CC: Joerg Roedel <joerg.roedel@amd.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-3-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:13:13 -07:00
Konrad Rzeszutek Wilk	0444ad93ea	x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure This patch set adds a mechanism to "modularize" the IOMMUs we have on X86. Currently the count of IOMMUs is up to six and they have a complex relationship that requires careful execution order. 'pci_iommu_alloc' does that today, but most folks are unhappy with how it does it. This patch set addresses this and also paves a mechanism to jettison unused IOMMUs during run-time. For details that sparked this, please refer to: http://lkml.org/lkml/2010/8/2/282 The first solution that comes to mind is to convert wholesale the IOMMU detection routines to be called during initcall time frame. Unfortunately that misses the dependency relationship that some of the IOMMUs have (for example: for AMD-Vi IOMMU to work, GART detection MUST run first, and before all of that SWIOTLB MUST run). The second solution would be to introduce a registration call wherein the IOMMU would provide its detection/init routines and as well on what MUST run before it. That would work, except that the 'pci_iommu_alloc' which would run through this list, is called during mem_init. This means we don't have any memory allocator, and it is so early that we haven't yet started running through the initcall_t list. This solution borrows concepts from the 2nd idea and from how MODULE_INIT works. A macro is provided that each IOMMU uses to define it's detect function and early_init (before the memory allocate is active), and as well what other IOMMU MUST run before us. Since most IOMMUs depend on having SWIOTLB run first ("pci_swiotlb_detect") a convenience macro to depends on that is also provided. This macro is similar in design to MODULE_PARAM macro wherein we setup a .iommu_table section in which we populate it with the values that match a struct iommu_table_entry. During bootup we will sort through the array so that the IOMMUs that MUST run before us are first elements in the array. And then we just iterate through them calling the detection routine and if appropiate, the init routines. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1282845485-8991-2-git-send-email-konrad.wilk@oracle.com> CC: H. Peter Anvin <hpa@zytor.com> CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 15:12:53 -07:00
Haicheng Li	9b861528a8	x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: 0010:[<ffffffff810ace35>] [<ffffffff810ace35>] alloc_arraycache+0x35/0x69 [ 1139.243292] RSP: 0018:ffff8802799f9d78 EFLAGS: 00010286 [ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000 [ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010 [ 1139.243300] RBP: ffffffff8100c34e R08: 0000000000000002 R09: 0000000000000000 [ 1139.243303] R10: ffffffff8246dda0 R11: 000000d08246dda0 R12: ffff8802599bfff0 [ 1139.243305] R13: ffff88027904c040 R14: ffff8802799f8000 R15: 0000000000000001 [ 1139.243308] FS: 00007fe81bfe86e0(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 [ 1139.243311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1139.243313] CR2: ffff8884ffc00000 CR3: 000000026cf2d000 CR4: 00000000000006f0 [ 1139.243316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1139.243318] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1139.243321] Call Trace: [ 1139.243324] [<ffffffff810ace29>] ? alloc_arraycache+0x29/0x69 [ 1139.243328] [<ffffffff8135004e>] ? cpuup_callback+0x1b0/0x32a [ 1139.243333] [<ffffffff8105385d>] ? notifier_call_chain+0x33/0x5b [ 1139.243337] [<ffffffff810538a4>] ? __raw_notifier_call_chain+0x9/0xb [ 1139.243340] [<ffffffff8134ecfc>] ? cpu_up+0xb3/0x152 [ 1139.243344] [<ffffffff813388ce>] ? store_online+0x4d/0x75 [ 1139.243348] [<ffffffff811e53f3>] ? sysdev_store+0x1b/0x1d [ 1139.243351] [<ffffffff8110589f>] ? sysfs_write_file+0xe5/0x121 [ 1139.243355] [<ffffffff810b539d>] ? vfs_write+0xae/0x14a [ 1139.243358] [<ffffffff810b587f>] ? sys_write+0x47/0x6f [ 1139.243362] [<ffffffff8100b9ab>] ? system_call_fastpath+0x16/0x1b This patch makes sure to always replicate new direct mapping PGD entries to the PGDs of all processes, as well as ensures corresponding vmemmap mapping gets synced. V1: initial code by Andi Kleen. V2: fix several issues found in testing. V3: as suggested by Wu Fengguang, reuse common code of vmalloc_sync_all(). [ hpa: changed pgd_change from int to bool ] Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4FD8.6080100@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 14:02:33 -07:00
Haicheng Li	6afb5157b9	x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4ECD.1090607@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 14:02:29 -07:00
H. Peter Anvin	9ea77bdb39	x86, bios: Make the x86 early memory reservation a kernel option Add a kernel command-line option so the x86 early memory reservation size can be adjusted at runtime instead of only at compile time. Suggested-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <tip-d0cd7425fab774a480cce17c2f649984312d0b55@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-25 17:10:49 -07:00
Borislav Petkov	acf01734b1	x86, tsc: Remove CPU frequency calibration on AMD `6b37f5a20c` introduced the CPU frequency calibration code for AMD CPUs whose TSCs didn't increment with the core's P0 frequency. From F10h, revB onward, however, the TSC increment rate is denoted by MSRC001_0015[24] and when this bit is set (which should be done by the BIOS) the TSC increments with the P0 frequency so the calibration is not needed and booting can be a couple of mcecs faster on those machines. Besides, there should be virtually no machines out there which don't have this bit set, therefore this calibration can be safely removed. It is a shaky hack anyway since it assumes implicitly that the core is in P0 when BIOS hands off to the OS, which might not always be the case. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100825162823.GE26438@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-25 13:32:52 -07:00
Linus Torvalds	d4348c6789	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag tracing/trace_stack: Fix stack trace on ppc64	2010-08-25 10:50:07 -07:00
Linus Torvalds	5e686019df	Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states sched: Fix rq->clock synchronization when migrating tasks	2010-08-25 08:40:56 -07:00
Lin Ming	8d33091992	perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag If on Pentium4 CPUs the FORCE_OVF flag is set then an NMI happens on every event, which can generate a flood of NMIs. Clear it. Reported-by: Vince Weaver <vweaver1@eecs.utk.edu> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-25 15:15:33 +02:00
Ingo Molnar	7de5d895b2	Merge branch 'linus' into perf/core Merge reason: pick up perf fixes Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-25 13:10:00 +02:00
Lin Ming	04fba67163	perf: Remove unused variable This fixes the following build warning introduced by the callchain rework: arch/x86/kernel/cpu/perf_event.c:1574: warning: ‘perf_callchain_entry_nmi’ defined but not used Signed-off-by: Lin Ming <ming.m.lin@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1282718949.16443.75.camel@minggr.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-25 13:09:41 +02:00
Hugh Dickins	b7d4608977	x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT laptop: whereas cloning from kernel to low mappings pgd range does need to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY. Signed-off-by: Hugh Dickins <hughd@google.com> LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-24 23:05:17 -07:00
H. Peter Anvin	d0cd7425fa	x86, bios: By default, reserve the low 64K for all BIOSes The laundry list of BIOSes that need the low 64K reserved is getting very long, so make it the default across all BIOSes. This also allows the code to be simplified and unified with the reservation code for the first 4K. This resolves kernel bugzilla 16661 and who knows what else... Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-*@git.kernel.org>	2010-08-24 17:32:04 -07:00
Linus Torvalds	c05e1e23b8	Merge branch 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6 * 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6: xen: pvhvm: make it clearer that XEN_UNPLUG_* define bits in a bitfield xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary xen: pvhvm: allow user to request no emulated device unplug	2010-08-23 18:29:18 -07:00
Alok Kataria	b0f4c062fb	x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI VMI was the only user of the alloc_pmd_clone hook, given that VMI is now removed we can also remove this hook. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-23 17:09:44 -07:00
Alok Kataria	9863c90f68	x86, vmware: Remove deprecated VMI kernel support With the recent innovations in CPU hardware acceleration technologies from Intel and AMD, VMware ran a few experiments to compare these techniques to guest paravirtualization technique on VMware's platform. These hardware assisted virtualization techniques have outperformed the performance benefits provided by VMI in most of the workloads. VMware expects that these hardware features will be ubiquitous in a couple of years, as a result, VMware has started a phased retirement of this feature from the hypervisor. Please note that VMI has always been an optimization and non-VMI kernels still work fine on VMware's platform. Latest versions of VMware's product which support VMI are, Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence releases for these products will continue supporting VMI. For more details about VMI retirement take a look at this, http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html This feature removal was scheduled for 2.6.37 back in September 2009. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1282600151.19396.22.camel@ank32.eng.vmware.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-23 15:18:50 -07:00
Ma Ling	59daa706fb	x86, mem: Optimize memcpy by avoiding memory false dependece All read operations after allocation stage can run speculatively, all write operation will run in program order, and if addresses are different read may run before older write operation, otherwise wait until write commit. However CPU don't check each address bit, so read could fail to recognize different address even they are in different page.For example if rsi is 0xf004, rdi is 0xe008, in following operation there will generate big performance latency. 1. movq (%rsi), %rax 2. movq %rax, (%rdi) 3. movq 8(%rsi), %rax 4. movq %rax, 8(%rdi) If %rsi and rdi were in really the same meory page, there are TRUE read-after-write dependence because instruction 2 write 0x008 and instruction 3 read 0x00c, the two address are overlap partially. Actually there are in different page and no any issues, but without checking each address bit CPU could think they are in the same page, and instruction 3 have to wait for instruction 2 to write data into cache from write buffer, then load data from cache, the cost time read spent is equal to mfence instruction. We may avoid it by tuning operation sequence as follow. 1. movq 8(%rsi), %rax 2. movq %rax, 8(%rdi) 3. movq (%rsi), %rax 4. movq %rax, (%rdi) Instruction 3 read 0x004, instruction 2 write address 0x010, no any dependence. At last on Core2 we gain 1.83x speedup compared with original instruction sequence. In this patch we first handle small size(less 20bytes), then jump to different copy mode. Based on our micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X improvement, and up to 1.5X improvement for 1024 bytes on Corei7. (We use our micro-benchmark, and will do further test according to your requirment) Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-23 14:56:41 -07:00
Ma, Ling	fdf4289679	x86, mem: Don't implement forward memmove() as memcpy() memmove() allow source and destination address to be overlap, but there is no such limitation for memcpy(). Therefore, explicitly implement memmove() in both the forwards and backward directions, to give us the ability to optimize memcpy(). Signed-off-by: Ma Ling <ling.ma@intel.com> LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-23 14:14:27 -07:00
Shaohua Li	61c77326d1	x86, mm: Avoid unnecessary TLB flush In x86, access and dirty bits are set automatically by CPU when CPU accesses memory. When we go into the code path of below flush_tlb_fix_spurious_fault(), we already set dirty bit for pte and don't need flush tlb. This might mean tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When the CPUs do page write, they will automatically check the bit and no software involved. On the other hand, flush tlb in below position is harmful. Test creates CPU number of threads, each thread writes to a same but random address in same vma range and we measure the total time. Under a 4 socket system, original time is 1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is 20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for tlb flush. Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <20100816011655.GA362@sli10-desk.sh.intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrea Archangeli <aarcange@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-23 10:04:57 -07:00
Ian Campbell	1dc7ce99b0	xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary It is not immediately clear what this option causes to become ignored. The actual meaning is that it is not necessary to unplug the emulated devices to safely use the PV ones, even if the platform does not support the unplug protocol. (pressumably the user will only add this option if they have ensured that their domain configuration is safe). I think xen_emul_unplug=unnecessary better captures this. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>	2010-08-23 11:59:29 +01:00
Ian Campbell	c93a4dfb31	xen: pvhvm: allow user to request no emulated device unplug this allows the user to disable pvhvm and revert to emulated devices in case of a system misconfiguration (e.g. initramfs with only emulated drivers in it). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>	2010-08-23 11:59:28 +01:00
Linus Torvalds	3dc8d7f07e	Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: PIT: free irq source id in handling error path KVM: destroy workqueue on kvm_create_pit() failures KVM: fix poison overwritten caused by using wrong xstate size	2010-08-22 11:27:36 -07:00
Samuel Thibault	ddb0c5a689	Replace Configure with Enable in description of MAXSMP The "Configure" word tends to make user believe they have to say 'yes' to be able to choose the number of procs/nodes. "Enable" should be unambiguous enough. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-21 12:38:58 -07:00
Sergey Senozhatsky	51e3c1b558	x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev Fix BUG: using smp_processor_id() in preemptible thermal_throttle_add_dev. We know the cpu number when calling thermal_throttle_add_dev, so we can remove smp_processor_id call in thermal_throttle_add_dev by supplying the cpu number as argument. This should resolve kernel bugzilla 16615/16629. Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> LKML-Reference: <20100820073634.GB5209@swordfish.minsk.epam.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-20 19:56:00 -07:00
Linus Torvalds	36423a5ed5	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, apic: Fix apic=debug boot crash x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues x86-32: Fix dummy trampoline-related inline stubs x86-32: Separate 1:1 pagetables from swapper_pg_dir x86, cpu: Fix regression in AMD errata checking code	2010-08-20 14:25:08 -07:00
Peter Zijlstra	ed80526166	perf: Remove superfluous return values from perf_callchain_*() Fixes these build warnings introduced by the callchain rework: arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_kernel’: arch/x86/kernel/cpu/perf_event.c:1646: warning: ‘return’ with a value, in function returning void arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_user’: arch/x86/kernel/cpu/perf_event.c:1699: warning: ‘return’ with a value, in function returning void arch/x86/kernel/cpu/perf_event.c: At top level: arch/x86/kernel/cpu/perf_event.c:1607: warning: ‘perf_callchain_entry_nmi’ defined but not used Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-20 14:59:39 +02:00
Suresh Siddha	cd7240c0b9	x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states TSC's get reset after suspend/resume (even on cpu's with invariant TSC which runs at a constant rate across ACPI P-, C- and T-states). And in some systems BIOS seem to reinit TSC to arbitrary large value (still sync'd across cpu's) during resume. This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less than rq->age_stamp (introduced in 2.6.32). This leads to a big value returned by scale_rt_power() and the resulting big group power set by the update_group_power() is causing improper load balancing between busy and idle cpu's after suspend/resume. This resulted in multi-threaded workloads (like kernel-compilation) go slower after suspend/resume cycle on core i5 laptops. Fix this by recomputing cyc2ns_offset's during resume, so that sched_clock() continues from the point where it was left off during suspend. Reported-by: Florian Pritz <flo@xssn.at> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <stable@kernel.org> # [v2.6.32+] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-20 14:59:02 +02:00
Daniel Kiper	05e407603e	x86, apic: Fix apic=debug boot crash Fix a boot crash when apic=debug is used and the APIC is not properly initialized. This issue appears during Xen Dom0 kernel boot but the fix is generic and the crash could occur on real hardware as well. Signed-off-by: Daniel Kiper <dkiper@net-space.pl> Cc: xen-devel@lists.xensource.com Cc: konrad.wilk@oracle.com Cc: jeremy@goop.org Cc: <stable@kernel.org> # .35.x, .34.x, .33.x, .32.x LKML-Reference: <20100819224616.GB9967@router-fw-old.local.net-space.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-20 10:18:28 +02:00
Borislav Petkov	d7c53c9e82	x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d: Stuck ??" message due to multiple cores concurrently accessing the cpu_callin_mask, among others. Since these codepaths are not protected from concurrent access due to the fact that there's no sane reason for making an already complex code unnecessarily more complex - we hit the issue only when insanely switching cores off- and online - serialize hotplugging cores on the sysfs level and be done with it. [ v2.1: fix !HOTPLUG_CPU build ] Cc: <stable@kernel.org> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100819181029.GC17171@aftab> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-19 14:47:43 -07:00
Linus Torvalds	b3ea36b7a2	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kprobes/x86: Fix the return address of multiple kretprobes perf tools: Fix build error on read only source. perf, x86: Fix Intel-nhm PMU programming errata workaround	2010-08-19 09:06:49 -07:00
KUMANO Syuhei	737480a0d5	kprobes/x86: Fix the return address of multiple kretprobes Fix the return address of subsequent kretprobes when multiple kretprobes are set on the same function. For example: # cd /sys/kernel/debug/tracing # echo "r:event1 sys_symlink" > kprobe_events # echo "r:event2 sys_symlink" >> kprobe_events # echo 1 > events/kprobes/enable # ln -s /tmp/foo /tmp/bar (without this patch) # cat trace ln-897 [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink) ln-897 [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink) (with this patch) # cat trace ln-740 [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink) ln-740 [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink) Signed-off-by: KUMANO Syuhei <kumano.prog@gmail.com> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> LKML-Reference: <1281853084.3254.11.camel@camp10-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-19 12:49:56 +02:00
Ingo Molnar	c8710ad389	Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2010-08-19 12:48:09 +02:00
Frederic Weisbecker	927c7a9e92	perf: Fix race in callchains Now that software events don't have interrupt disabled anymore in the event path, callchains can nest on any context. So seperating nmi and others contexts in two buffers has become racy. Fix this by providing one buffer per nesting level. Given the size of the callchain entries (2040 bytes * 4), we now need to allocate them dynamically. v2: Fixed put_callchain_entry call after recursion. Fix the type of the recursion, it must be an array. v3: Use a manual pr cpu allocation (temporary solution until NMIs can safely access vmalloc'ed memory). Do a better separation between callchain reference tracking and allocation. Make the "put" path lockless for non-release cases. v4: Protect the callchain buffers with rcu. v5: Do the cpu buffers allocations node affine. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David Miller <davem@davemloft.net> Cc: Borislav Petkov <bp@amd64.org>	2010-08-19 01:32:31 +02:00
Frederic Weisbecker	f72c1a931e	perf: Factorize callchain context handling Store the kernel and user contexts from the generic layer instead of archs, this gathers some repetitive code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>	2010-08-19 01:32:11 +02:00
Frederic Weisbecker	56962b4449	perf: Generalize some arch callchain code - Most archs use one callchain buffer per cpu, except x86 that needs to deal with NMIs. Provide a default perf_callchain_buffer() implementation that x86 overrides. - Centralize all the kernel/user regs handling and invoke new arch handlers from there: perf_callchain_user() / perf_callchain_kernel() That avoid all the user_mode(), current->mm checks and so... - Invert some parameters in perf_callchain_*() helpers: entry to the left, regs to the right, following the traditional (dst, src). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>	2010-08-19 01:30:59 +02:00
Frederic Weisbecker	70791ce9ba	perf: Generalize callchain_store() callchain_store() is the same on every archs, inline it in perf_event.h and rename it to perf_callchain_store() to avoid any collision. This removes repetitive code. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Paul Mackerras <paulus@samba.org> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>	2010-08-19 01:30:11 +02:00
Frederic Weisbecker	c1a65932fd	perf: Drop unappropriate tests on arch callchains Drop the TASK_RUNNING test on user tasks for callchains as this check doesn't seem to make any sense. Also remove the tests for !current that is not supposed to happen and current->pid as this should be handled at the generic level, with exclude_idle attribute. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> Cc: David Miller <davem@davemloft.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Borislav Petkov <bp@amd64.org>	2010-08-19 01:29:35 +02:00
H. Peter Anvin	8848a91068	x86-32: Fix dummy trampoline-related inline stubs Fix dummy inline stubs for trampoline-related functions when no trampolines exist (until we get rid of the no-trampoline case entirely.) Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Cc: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <4C6C294D.3030404@zytor.com>	2010-08-18 12:42:24 -07:00
Joerg Roedel	fd89a13792	x86-32: Separate 1:1 pagetables from swapper_pg_dir This patch fixes machine crashes which occur when heavily exercising the CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by AMD Erratum 383 and result in a fatal machine check exception. Here's the scenario: 1. On 32-bit, the swapper_pg_dir page table is used as the initial page table for booting a secondary CPU. 2. To make this work, swapper_pg_dir needs a direct mapping of physical memory in it (the low mappings). By adding those low, large page (2M) mappings (PAE kernel), we create the necessary conditions for Erratum 383 to occur. 3. Other CPUs which do not participate in the off- and onlining game may use swapper_pg_dir while the low mappings are present (when leave_mm is called). For all steps below, the CPU referred to is a CPU that is using swapper_pg_dir, and not the CPU which is being onlined. 4. The presence of the low mappings in swapper_pg_dir can result in TLB entries for addresses below __PAGE_OFFSET to be established speculatively. These TLB entries are marked global and large. 5. When the CPU with such TLB entry switches to another page table, this TLB entry remains because it is global. 6. The process then generates an access to an address covered by the above TLB entry but there is a permission mismatch - the TLB entry covers a large global page not accessible to userspace. 7. Due to this permission mismatch a new 4kb, user TLB entry gets established. Further, Erratum 383 provides for a small window of time where both TLB entries are present. This results in an uncorrectable machine check exception signalling a TLB multimatch which panics the machine. There are two ways to fix this issue: 1. Always do a global TLB flush when a new cr3 is loaded and the old page table was swapper_pg_dir. I consider this a hack hard to understand and with performance implications 2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit does. This patch implements solution 2. It introduces a trampoline_pg_dir which has the same layout as swapper_pg_dir with low_mappings. This page table is used as the initial page table of the booting CPU. Later in the bringup process, it switches to swapper_pg_dir and does a global TLB flush. This fixes the crashes in our test cases. -v2: switch to swapper_pg_dir right after entering start_secondary() so that we are able to access percpu data which might not be mapped in the trampoline page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> LKML-Reference: <20100816123833.GB28147@aftab> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-18 09:17:20 -07:00
Hans Rosenfeld	07a7795ca2	x86, cpu: Fix regression in AMD errata checking code A bug in the family-model-stepping matching code caused the presence of errata to go undetected when OSVW was not used. This causes hangs on some K8 systems because the E400 workaround is not enabled. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-18 09:16:28 -07:00
Zhang, Yanmin	351af0725e	perf, x86: Fix Intel-nhm PMU programming errata workaround Fix the Errata AAK100/AAP53/BD53 workaround, the officialy documented workaround we implemented in: `11164cd`: perf, x86: Add Nehelem PMU programming errata workaround doesn't actually work fully and causes a stuck PMU state under load and non-functioning perf profiling. A functional workaround was found by trial & error. Affects all Nehalem-class Intel PMUs. Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1281073148.2125.63.camel@ymzhang.sh.intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <stable@kernel.org> # .35.x Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-18 11:17:39 +02:00
Linus Torvalds	392abeea52	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: vt,console,kdb: preserve console_blanked while in kdb vt: fix regression warnings from KMS merge arm,kgdb: fix GDB_MAX_REGS no longer used kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c kdb: fix compile error without CONFIG_KALLSYMS	2010-08-17 18:36:19 -07:00
David Howells	d7627467b7	Make do_execve() take a const filename pointer Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-17 18:07:43 -07:00
Jesse Barnes	23b90cfd7b	x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set Otherwise we'll duplicate definitions with the pci.h stubs. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-08-17 09:29:36 -07:00
Xiao Guangrong	6b5d7a9f6f	KVM: PIT: free irq source id in handling error path Free irq source id if create pit workqueue fail Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-17 12:04:23 +03:00
Namhyung Kim	8c8aefce93	kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c breakinfo->pev is a pointer to percpu pointer but was missing __percpu markup. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-08-16 15:58:30 -05:00
Linus Torvalds	2245ba2a3a	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: gcc-4.6: ACPI: fix unused but set variables in ACPI ACPI thermal: make procfs I/F depend on CONFIG_ACPI_PROCFS ACPI video: make procfs I/F depend on CONFIG_ACPI_PROCFS ACPI processor: remove deprecated ACPI procfs I/F ACPI power_resource: remove unused procfs I/F ACPI: remove deprecated ACPI procfs I/F ACPI: introduce drivers/acpi/sysfs.c ACPI: introduce module parameter acpi.aml_debug_output ACPI: introduce drivers/acpi/debugfs.c ACPI, APEI, ERST debug support ACPI, APEI, Manage GHES as platform devices ACPI, APEI, Rename CPER and GHES severity constants ACPI, APEI, Fix a typo of error path of apei_resources_request ACPI / ACPICA: Fix reference counting problems with GPE handlers ACPI: Add the check of ADR flag in course of finding ACPI handle for PCI device ACPI / Sleep: Drop acpi_suspend_finish() ACPI / Sleep: Consolidate suspend and hibernation routines ACPI / Wakeup: Simplify enabling of wakeup devices ACPI / Sleep: Rework enabling wakeup devices ACPI / Sleep: Free NVS copy if suspending of devices fails Fixed up totally buggered "ACPI: fix unused but set variables in ACPI" patch that doesn't even compile in the merge. Thanks to Sedat Dilek <sedat.dilek@googlemail.com> for noticing the breakage before I even pulled. And a big "Grrr.." at Len for not even bothering to compile the tree before asking me to pull.	2010-08-15 17:37:07 -07:00
Xiaotian Feng	3185bf8c23	KVM: destroy workqueue on kvm_create_pit() failures kernel needs to destroy workqueue if kvm_create_pit() fails, otherwise after pit is freed, the workqueue is leaked. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-15 14:17:35 +03:00
Xiaotian Feng	f45755b834	KVM: fix poison overwritten caused by using wrong xstate size fpu.state is allocated from task_xstate_cachep, the size of task_xstate_cachep is xstate_size. xstate_size is set from cpuid instruction, which is often smaller than sizeof(struct xsave_struct). kvm is using sizeof(struct xsave_struct) to fill in/out fpu.state.xsave, as what we allocated for fpu.state is xstate_size, kernel will write out of memory and caused poison/redzone/padding overwritten warnings. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Reviewed-by: Sheng Yang <sheng@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Brian Gerst <brgerst@gmail.com> Cc: Avi Kivity <avi@redhat.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Sheng Yang <sheng@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-08-15 14:10:15 +03:00
Len Brown	95ee46aa86	Merge branch 'linus' into release Conflicts: drivers/acpi/debug.c Signed-off-by: Len Brown <len.brown@intel.com>	2010-08-15 01:06:31 -04:00
Sam Ravnborg	8b1bb90701	defconfig reduction Use the defconfig files generated by "make savedefconfig" for remaining defconfig files. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2010-08-14 22:26:53 +02:00
Sam Ravnborg	bf56fba670	archs: replace unifdef-y with header-y unifdef-y and header-y have same semantic, so drop unifdef-y Signed-off-by: Sam Ravnborg <sam@ravnborg.org>	2010-08-14 22:26:51 +02:00
Linus Torvalds	c206d44ffd	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Make kdump avoid stack dumps - fix !CONFIG_KEXEC breakage x86, UV: Initialize BAU hub map x86, UV: Make kdump avoid stack dumps	2010-08-13 18:00:25 -07:00
Linus Torvalds	78b148696d	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] acpi-cpufreq: add missing __percpu markup	2010-08-13 17:59:09 -07:00

... 16 17 18 19 20 ...

12946 Commits