linux

Author	SHA1	Message	Date
Ingo Molnar	30285c6f03	Merge branch 'x86/apic-cleanups' into x86/urgent Merge reason: Topic is ready for upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-10 09:34:50 +01:00
Ingo Molnar	4385428a47	Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent	2011-01-09 10:42:21 +01:00
Cyrill Gorcunov	047a3772fe	perf, x86: P4 PMU - Fix unflagged overflows handling Don found that P4 PMU reads CCCR register instead of counter itself (in attempt to catch unflagged event) this makes P4 NMI handler to consume all NMIs it observes. So the other NMI users such as kgdb simply have no chance to get NMI on their hands. Side note: at moment there is no way to run nmi-watchdog together with perf tool. This is because both 'perf top' and nmi-watchdog use same event. So while nmi-watchdog reserves one event/counter for own needs there is no room for perf tool left (there is a way to disable nmi-watchdog on boot of course). Ming has tested this patch with the following results \| 1. watchdog disabled \| \| kgdb tests on boot OK \| perf works OK \| \| 2. watchdog enabled, without patch perf-x86-p4-nmi-4 \| \| kgdb tests on boot hang \| \| 3. watchdog enabled, without patch perf-x86-p4-nmi-4 and do not run kgdb \| tests on boot \| \| "perf top" partialy works \| cpu-cycles no \| instructions yes \| cache-references no \| cache-misses no \| branch-instructions no \| branch-misses yes \| bus-cycles no \| \| 4. watchdog enabled, with patch perf-x86-p4-nmi-4 applied \| \| kgdb tests on boot OK \| perf does not work, NMI "Dazed and confused" messages show up \| Which means we still have problems with p4 box due to 'unknown' nmi happens but at least it should fix kgdb test cases. Reported-by: Jason Wessel <jason.wessel@windriver.com> Reported-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Don Zickus <dzickus@redhat.com> Acked-by: Lin Ming <ming.m.lin@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <4D275E7E.3040903@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-09 10:40:52 +01:00
Randy Dunlap	91d88ce22b	x86: Fix sparse non-ANSI function warnings in smpboot.c Fix sparse warning for non-ANSI function declaration: arch/x86/kernel/smpboot.c💯30: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_lock' arch/x86/kernel/smpboot.c:105:32: warning: non-ANSI function declaration of function 'cpu_hotplug_driver_unlock' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> LKML-Reference: <20110108195914.95d366ea.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-09 10:15:19 +01:00
Linus Torvalds	72eb6a7914	Merge branch 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits) gameport: use this_cpu_read instead of lookup x86: udelay: Use this_cpu_read to avoid address calculation x86: Use this_cpu_inc_return for nmi counter x86: Replace uses of current_cpu_data with this_cpu ops x86: Use this_cpu_ops to optimize code vmstat: User per cpu atomics to avoid interrupt disable / enable irq_work: Use per cpu atomics instead of regular atomics cpuops: Use cmpxchg for xchg to avoid lock semantics x86: this_cpu_cmpxchg and this_cpu_xchg operations percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support percpu,x86: relocate this_cpu_add_return() and friends connector: Use this_cpu operations xen: Use this_cpu_inc_return taskstats: Use this_cpu_ops random: Use this_cpu_inc_return fs: Use this_cpu_inc_return in buffer.c highmem: Use this_cpu_xx_return() operations vmstat: Use this_cpu_inc_return for vm statistics x86: Support for this_cpu_add, sub, dec, inc_return percpu: Generic support for this_cpu_add, sub, dec, inc_return ... Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c} as per Tejun.	2011-01-07 17:02:58 -08:00
Frederic Weisbecker	625dbc3b8a	x86: Save rbp in pt_regs on irq entry From the x86_64 low level interrupt handlers, the frame pointer is saved right after the partial pt_regs frame. rbp is not supposed to be part of the irq partial saved registers, but it only requires to extend the pt_regs frame by 8 bytes to do so, plus a tiny stack offset fixup on irq exit. This changes a bit the semantics or get_irq_entry() that is supposed to provide only the value of caller saved registers and the cpu saved frame. However it's a win for unwinders that can walk through stack frames on top of get_irq_regs() snapshots. A noticeable impact is that it makes perf events cpu-clock and task-clock events based callchains working on x86_64. Let's then save rbp into the irq pt_regs. As a result with: perf record -e cpu-clock perf bench sched messaging perf report --stdio Before: 20.94% perf [kernel.kallsyms] [k] lock_acquire \| --- lock_acquire \| \|--44.01%-- __write_nocancel \| \|--43.18%-- __read \| \|--6.08%-- fork \| create_worker \| \|--0.88%-- _dl_fixup \| \|--0.65%-- do_lookup_x \| \|--0.53%-- __GI___libc_read --4.67%-- [...] After: 19.23% perf [kernel.kallsyms] [k] __lock_acquire \| --- __lock_acquire \| \|--97.74%-- lock_acquire \| \| \| \|--21.82%-- _raw_spin_lock \| \| \| \| \| \|--37.26%-- unix_stream_recvmsg \| \| \| sock_aio_read \| \| \| do_sync_read \| \| \| vfs_read \| \| \| sys_read \| \| \| system_call \| \| \| __read \| \| \| \| \| \|--24.09%-- unix_stream_sendmsg \| \| \| sock_aio_write \| \| \| do_sync_write \| \| \| vfs_write \| \| \| sys_write \| \| \| system_call \| \| \| __write_nocancel v2: Fix cfi annotations. Reported-by: Soeren Sandmann Pedersen <sandmann@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Stephane Eranian <eranian@google.com> Cc: Jan Beulich <JBeulich@novell.com>	2011-01-07 17:40:56 +01:00
Rakib Mullick	39a6eebda2	x86, dumpstack: Fix unused variable warning In dump_stack function, bp isn't used anymore, which is introduced by commit `9c0729dc80`. This patch removes bp completely. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Soeren Sandmann <sandmann@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <AANLkTik9U_Z0WSZ7YjrykER_pBUfPDdgUUmtYx=R74nL@mail.gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2011-01-07 16:59:49 +01:00
Sheng Yang	d9b8ca8474	xen: HVM X2APIC support This patch is similiar to Gleb Natapov's patch for KVM, which enable the hypervisor to emulate x2apic feature for the guest. By this way, the emulation of lapic would be simpler with x2apic interface(MSR), and faster. [v2: Re-organized 'xen_hvm_need_lapic' per Ian Campbell suggestion] Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-01-07 10:03:50 -05:00
Sheng Yang	2904ed8dd5	apic: Move hypervisor detection of x2apic to hypervisor.h Then we can reuse it for Xen later. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Avi Kivity <avi@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-01-07 10:03:49 -05:00
Don Zickus	f2fd43954a	x86, NMI: Clean-up default_do_nmi() Just re-arrange the code a bit to make it easier to follow what is going on. Basically un-negating the if-statement and swapping the code inside the if-statement with code outside. No functional changes. Originally-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-7-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 15:08:53 +01:00
Don Zickus	ab846f13f6	x86, NMI: Allow NMI reason io port (0x61) to be processed on any CPU In original NMI handler, NMI reason io port (0x61) is only processed on BSP. This makes it impossible to hot-remove BSP. To solve the issue, a raw spinlock is used to allow the port to be processed on any CPU. Originally-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-6-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 15:08:53 +01:00
Don Zickus	c410b83077	x86, NMI: Remove DIE_NMI_IPI With priorities in place and no one really understanding the difference between DIE_NMI and DIE_NMI_IPI, just remove DIE_NMI_IPI and convert everyone to DIE_NMI. This also simplifies default_do_nmi() a little bit. Instead of calling the die_notifier in both the if and else part, just pull it out and call it before the if-statement. This has the side benefit of avoiding a call to the ioport to see if there is an external NMI sitting around until after the (more frequent) internal NMIs are dealt with. Patch-Inspired-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-5-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 15:08:53 +01:00
Don Zickus	166d751479	x86, NMI: Add priorities to handlers In order to consolidate the NMI die_chain events, we need to setup the priorities for the die notifiers. I started by defining a bunch of common priorities that can be used by the notifier blocks. Then I modified the notifier blocks to use the newly created priorities. Now that the priorities are straightened out, it should be easier to remove the event DIE_NMI_IPI. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-4-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 15:08:52 +01:00
Don Zickus	673a6092ce	x86: Convert some devices to use DIE_NMIUNKNOWN They are a handful of places in the code that register a die_notifier as a catch all in case no claims the NMI. Unfortunately, they trigger on events like DIE_NMI and DIE_NMI_IPI, which depending on when they registered may collide with other handlers that have the ability to determine if the NMI is theirs or not. The function unknown_nmi_error() makes one last effort to walk the die_chain when no one else has claimed the NMI before spitting out messages that the NMI is unknown. This is a better spot for these devices to execute any code without colliding with the other handlers. The two drivers modified are only compiled on x86 arches I believe, so they shouldn't be affected by other arches that may not have DIE_NMIUNKNOWN defined. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Russ Anderson <rja@sgi.com> Cc: Corey Minyard <minyard@acm.org> Cc: openipmi-developer@lists.sourceforge.net Cc: dann frazier <dannf@hp.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 15:08:52 +01:00
Huang Ying	1c7b74d46f	x86, NMI: Add NMI symbol constants and rename memory parity to PCI SERR Replace the NMI related magic numbers with symbol constants. Memory parity error is only valid for IBM PC-AT, newer machine use bit 7 (0x80) of 0x61 port for PCI SERR. While memory error is usually reported via MCE. So corresponding function name and kernel log string is changed. But on some machines, PCI SERR line is still used to report memory errors. This is used by EDAC, so corresponding EDAC call is reserved. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1294348732-15030-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 15:08:51 +01:00
Ingo Molnar	1c2a48cf65	Merge branch 'linus' into x86/apic-cleanups Conflicts: arch/x86/include/asm/io_apic.h Merge reason: Resolve the conflict, update to a more recent -rc base Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 14:14:15 +01:00
David Rientjes	d906f0eb2f	x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation "x86, numa: Fake node-to-cpumask for NUMA emulation" broke the build when CONFIG_DEBUG_PER_CPU_MAPS is set and CONFIG_NUMA_EMU is not. This is because it is possible to map a cpu to multiple nodes when NUMA emulation is used; the patch required a physical node address table to find those nodes that was only available when CONFIG_NUMA_EMU was enabled. This extracts the common debug functionality to its own function for CONFIG_DEBUG_PER_CPU_MAPS and uses it regardless of whether CONFIG_NUMA_EMU is set or not. NUMA emulation will now iterate over the set of possible nodes for each cpu and call the new debug function whereas only the cpu's node will be used without NUMA emulation enabled. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <alpine.DEB.2.00.1012301053590.12995@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 14:09:34 +01:00
Rafael J. Wysocki	976513dbfc	PM / ACPI: Move NVS saving and restoring code to drivers/acpi The saving of the ACPI NVS area during hibernation and suspend and restoring it during the subsequent resume is entirely specific to ACPI, so move it to drivers/acpi and drop the CONFIG_SUSPEND_NVS configuration option which is redundant. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Len Brown <len.brown@intel.com>	2011-01-07 00:36:55 -05:00
Linus Torvalds	cb600d2f83	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mm: Initialize initial_page_table before paravirt jumps	2011-01-06 11:12:17 -08:00
Linus Torvalds	47935a731b	Merge branches 'x86-alternatives-for-linus', 'x86-fpu-for-linus', 'x86-hwmon-for-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, asm: Use fxsaveq/fxrestorq in more places * 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hwmon: Add core threshold notification to therm_throt.c * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Use native_halt on a halt, not native_safe_halt * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: locking, lockdep: Convert sprintf_symbol to %pS * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: Better struct irqaction layout	2011-01-06 11:11:50 -08:00
Linus Torvalds	77a0dd54ba	Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV, BAU: Extend for more than 16 cpus per socket x86, UV: Fix the effect of extra bits in the hub nodeid register x86, UV: Add common uv_early_read_mmr() function for reading MMRs	2011-01-06 11:09:57 -08:00
Linus Torvalds	d7a5a18190	Merge branch 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-tsc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Check tsc available/disabled in the delayed init function x86: Improve TSC calibration using a delayed workqueue x86: Make tsc=reliable override boot time stability checks	2011-01-06 11:08:14 -08:00
Linus Torvalds	4f00b901d4	Merge branch 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: module: Move RO/NX module protection to after ftrace module update x86: Resume trampoline must be executable x86: Add RO/NX protection for loadable kernel modules x86: Add NX protection for kernel data x86: Fix improper large page preservation	2011-01-06 11:07:33 -08:00
Linus Torvalds	b4c6e2ea5e	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, earlyprintk: Move mrst early console to platform/ and fix a typo x86, apbt: Setup affinity for apb timers acting as per-cpu timer ce4100: Add errata fixes for UART on CE4100 x86: platform: Move iris to x86/platform where it belongs x86, mrst: Check platform_device_register() return code x86/platform: Add Eurobraille/Iris power off support x86, mrst: Add explanation for using 1960 as the year offset for vrtc x86, mrst: Fix dependencies of "select INTEL_SCU_IPC" x86, mrst: The shutdown for MRST requires the SCU IPC mechanism x86: Ce4100: Add reboot_fixup() for CE4100 ce4100: Add PCI register emulation for CE4100 x86: Add CE4100 platform support x86: mrst: Set vRTC's IRQ to level trigger type x86: mrst: Add audio driver bindings rtc: Add drivers/rtc/rtc-mrst.c x86: mrst: Add vrtc driver which serves as a wall clock device x86: mrst: Add Moorestown specific reboot/shutdown support x86: mrst: Parse SFI timer table for all timer configs x86/mrst: Add SFI platform device parsing code	2011-01-06 11:06:31 -08:00
Linus Torvalds	6f46b120a9	Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, microcode, AMD: Cleanup code a bit x86, microcode, AMD: Replace vmalloc+memset with vzalloc	2011-01-06 11:06:09 -08:00
Linus Torvalds	4e1db5e58a	Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: apic, amd: Make firmware bug messages more meaningful mce, amd: Remove goto in threshold_create_device() mce, amd: Add helper functions to setup APIC mce, amd: Shorten local variables mci_misc_{hi,lo} mce, amd: Implement mce_threshold_block_init() helper function	2011-01-06 11:05:21 -08:00
Linus Torvalds	37d9a8c5ea	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix included-by file reference comments x86, cpu: Only CPU features determine NX capabilities x86, cpu: Call verify_cpu during 32bit CPU startup x86, cpu: Clear XD_DISABLED flag on Intel to regain NX x86, cpu: Rename verify_cpu_64.S to verify_cpu.S	2011-01-06 10:56:02 -08:00
Linus Torvalds	017892c341	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation x86, acpi: Add MAX_LOCAL_APIC for 32bit x86: io_apic: Split setup_ioapic_ids_from_mpc() x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings() x86: Allow platforms to force enable apic	2011-01-06 10:51:36 -08:00
Linus Torvalds	42cbd8efb0	Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cacheinfo: Cleanup L3 cache index disable support x86, amd-nb: Cleanup AMD northbridge caching code x86, amd-nb: Complete the rename of AMD NB and related code	2011-01-06 10:50:28 -08:00
Huang Ying	74d91e3c6a	x86, NMI: Add touch_nmi_watchdog to io_check_error delay Prevent the long delay in io_check_error making NMI watchdog timeout. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-05 14:22:58 +01:00
Dongdong Deng	554ec06398	x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time The spin_lock_debug/rcu_cpu_stall detector uses trigger_all_cpu_backtrace() to dump cpu backtrace. Therefore it is possible that trigger_all_cpu_backtrace() could be called at the same time on different CPUs, which triggers and 'unknown reason NMI' warning. The following case illustrates the problem: CPU1 CPU2 ... CPU N trigger_all_cpu_backtrace() set "backtrace_mask" to cpu mask \| generate NMI interrupts generate NMI interrupts ... \ \| / \ \| / The "backtrace_mask" will be cleaned by the first NMI interrupt at nmi_watchdog_tick(), then the following NMI interrupts generated by other cpus's arch_trigger_all_cpu_backtrace() will be taken as unknown reason NMI interrupts. This patch uses a test_and_set to avoid the problem, and stop the arch_trigger_all_cpu_backtrace() from calling to avoid dumping a double cpu backtrace info when there is already a trigger_all_cpu_backtrace() in progress. Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> Reviewed-by: Bruce Ashfield <bruce.ashfield@windriver.com> Cc: fweisbec@gmail.com LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Don Zickus <dzickus@redhat.com>	2011-01-05 14:22:57 +01:00
Don Zickus	9ab181fa9f	x86: Only call smp_processor_id in non-preempt cases There are some paths that walk the die_chain with preemption on. Make sure we are in an NMI call before we start doing anything. This was triggered by do_general_protection calling notify_die with DIE_GPF. Reported-by: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: Don Zickus <dzickus@redhat.com> LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-05 14:22:57 +01:00
Ingo Molnar	aef1b9cef7	Merge commit 'v2.6.37' into perf/core Merge reason: Add the final .37 tree. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-05 14:22:10 +01:00
Yinghai Lu	cb2ded37fd	x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion Found one x2apic pre-enabled system, x2apic_mode suddenly get corrupted after register some cpus, when compiled CONFIG_NR_CPUS=255 instead of 512. It turns out that generic_processor_info() ==> phyid_set(apicid, phys_cpu_present_map) causes the problem. phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled system some cpus have an apic id > 255. The variable after phys_cpu_present_map may get corrupted silently: ffffffff828e8420 B phys_cpu_present_map ffffffff828e8440 B apic_verbosity ffffffff828e8444 B local_apic_timer_c2_ok ffffffff828e8448 B disable_apic ffffffff828e844c B x2apic_mode ffffffff828e8450 B x2apic_disabled ffffffff828e8454 B num_processors ... Actually phys_cpu_present_map is referenced via apic id, instead index. We should use MAX_LOCAL_APIC instead MAX_APICS. For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes on 64-bit: text data bss dec filename 21696943 4193748 12787712 38678403 vmlinux.before 21696943 4193748 12791808 38682499 vmlinux.after No change on 32bit. Finally we can remove MAX_APCIS that was rather confusing. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4D23BD9C.3070102@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-05 14:09:23 +01:00
Yinghai Lu	f005fe12b9	x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping() It is not related to init_memory_mapping(), and init_memory_mapping() is getting more bigger. So make it as seperated function and call it from reserve_brk() and that is point when _brk_end is concluded. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933E0.7090305@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-01-05 14:03:45 +01:00
Rusty Russell	d50d8fe192	x86, mm: Initialize initial_page_table before paravirt jumps v2.6.36-rc8-54-gb40827f (x86-32, mm: Add an initial page table for core bootstrapping) made x86 boot using initial_page_table and broke lguest. For 2.6.37 we simply cut & paste the initialization code into lguest (`da32dac101` "lguest: populate initial_page_table"), now we fix it properly by doing that initialization before the paravirt jump. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Cc: lguest <lguest@ozlabs.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <201101041720.54535.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-04 09:53:50 +01:00
Ingo Molnar	bc030d6cb9	Merge commit 'v2.6.37-rc8' into x86/apic Conflicts: arch/x86/include/asm/io_apic.h Merge reason: move to a fresh -rc, resolve the conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-04 09:43:42 +01:00
Thomas Renninger	25e41933b5	perf: Clean up power events by introducing new, more generic ones Add these new power trace events: power:cpu_idle power:cpu_frequency power:machine_suspend The old C-state/idle accounting events: power:power_start power:power_end Have now a replacement (but we are still keeping the old tracepoints for compatibility): power:cpu_idle and power:power_frequency is replaced with: power:cpu_frequency power:machine_suspend is newly introduced. Jean Pihet has a patch integrated into the generic layer (kernel/power/suspend.c) which will make use of it. the type= field got removed from both, it was never used and the type is differed by the event type itself. perf timechart userspace tool gets adjusted in a separate patch. Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Jean Pihet <jean.pihet@newoldbits.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: rjw@sisk.pl LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>	2011-01-04 08:16:54 +01:00
Ingo Molnar	cc22219699	Merge commit 'v2.6.37-rc8' into perf/core Merge reason: pick up latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-04 08:08:54 +01:00
Christoph Lameter	357089fca9	x86: udelay: Use this_cpu_read to avoid address calculation The code will use a segment prefix instead of doing the lookup and calculation. Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-01-04 06:08:55 +01:00
Cliff Wickman	cfa60917f0	x86, UV, BAU: Extend for more than 16 cpus per socket Fix a hard-coded limit of a maximum of 16 cpu's per socket. The UV Broadcast Assist Unit code initializes by scanning the cpu topology of the system and assigning a master cpu for each socket and UV hub. That scan had an assumption of a limit of 16 cpus per socket. With Westmere we are going over that limit. The UV hub hardware will allow up to 32. If the scan finds the system has gone over that limit it returns an error and we print a warning and fall back to doing TLB shootdowns without the BAU. Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: <stable@kernel.org> # .37.x LKML-Reference: <E1PZol7-0000mM-77@eag09.americas.sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-03 20:35:03 +01:00
R, Durgadoss	9e76a97efd	x86, hwmon: Add core threshold notification to therm_throt.c This patch adds code to therm_throt.c to notify core thermal threshold events. These thresholds are supported by the IA32_THERM_INTERRUPT register. The status/log for the same is monitored using the IA32_THERM_STATUS register. The necessary #defines are in msr-index.h. A call back is added to mce.h, to further notify the thermal stack, about the threshold events. Signed-off-by: Durgadoss R <durgadoss.r@intel.com> LKML-Reference: <D6D887BA8C9DFF48B5233887EF04654105C1251710@bgsmsx502.gar.corp.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-01-03 08:30:30 -08:00
Robert Richter	c7c25802b3	arch/x86/oprofile/op_model_amd.c: Perform initialisation on a single CPU Disable preemption in init_ibs(). The function only checks the ibs capabilities and sets up pci devices (if necessary). It runs only on one cpu but operates with the local APIC and some MSRs, thus it is better to disable preemption. [ 7.034377] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/483 [ 7.034385] caller is setup_APIC_eilvt+0x155/0x180 [ 7.034389] Pid: 483, comm: modprobe Not tainted 2.6.37-rc1-20101110+ #1 [ 7.034392] Call Trace: [ 7.034400] [<ffffffff812a2b72>] debug_smp_processor_id+0xd2/0xf0 [ 7.034404] [<ffffffff8101e985>] setup_APIC_eilvt+0x155/0x180 [ ... ] Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22812 Reported-by: <atswartz@gmail.com> Signed-off-by: Robert Richter <robert.richter@amd.com> Cc: oprofile-list@lists.sourceforge.net <oprofile-list@lists.sourceforge.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Dan Carpenter <error27@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> [2.6.37.x] LKML-Reference: <20110103111514.GM4739@erda.amd.com> [ small cleanups ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-03 13:01:40 +01:00
Linus Torvalds	9109f4eb84	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: i8259: initialize isr_ack KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow	2011-01-02 10:44:21 -08:00
Avi Kivity	010c520e20	KVM: Don't reset mmu context unnecessarily when updating EFER The only bit of EFER that affects the mmu is NX, and this is already accounted for (LME only takes effect when changing cr0). Based on a patch by Hillf Danton. Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-02 12:05:15 +02:00
Avi Kivity	d0dfc6b74a	KVM: i8259: initialize isr_ack isr_ack is never initialized. So, until the first PIC reset, interrupts may fail to be injected. This can cause Windows XP to fail to boot, as reported in the fallout from the fix to https://bugzilla.kernel.org/show_bug.cgi?id=21962. Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2011-01-02 11:52:48 +02:00
Tejun Heo	c1955b5f3a	x86: Use this_cpu_inc_return for nmi counter this_cpu_inc_return() saves us a memory access there. Reviewed-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-30 12:22:17 +01:00
Tejun Heo	7b543a5334	x86: Replace uses of current_cpu_data with this_cpu ops Replace all uses of current_cpu_data with this_cpu operations on the per cpu structure cpu_info. The scala accesses are replaced with the matching this_cpu ops which results in smaller and more efficient code. In the long run, it might be a good idea to remove cpu_data() macro too and use per_cpu macro directly. tj: updated description Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-30 12:22:03 +01:00
Tejun Heo	0a3aee0da4	x86: Use this_cpu_ops to optimize code Go through x86 code and replace __get_cpu_var and get_cpu_var instances that refer to a scalar and are not used for address determinations. Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-30 12:20:28 +01:00
Ingo Molnar	56f4c40034	Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core	2010-12-30 11:26:45 +01:00
Yinghai Lu	1411e0ec31	x86-64, numa: Put pgtable to local node memory Introduce init_memory_mapping_high(), and use it with 64bit. It will go with every memory segment above 4g to create page table to the memory range itself. before this patch all page tables was on one node. with this patch, one RED-PEN is killed debug out for 8 sockets system after patch [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] RAMDISK: 7bc84000 - 7f745000 .... [ 0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used [ 0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used [ 0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used [ 0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used [ 0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used [ 0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used [ 0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used [ 0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used [ 0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used [ 0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff] [ 0.000000] 0100000000 - 1080000000 page 2M [ 0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff] [ 0.000000] memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff] [ 0.000000] 1080000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ffc0000-0x207fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00002080000000-0x0000307fffffff] [ 0.000000] 2080000000 - 3080000000 page 2M [ 0.000000] kernel direct mapping tables up to 3080000000 @ [0x307ff3d000-0x307fffffff] [ 0.000000] memblock_x86_reserve_range: [0x307ffc0000-0x307fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00003080000000-0x0000407fffffff] [ 0.000000] 3080000000 - 4080000000 page 2M [ 0.000000] kernel direct mapping tables up to 4080000000 @ [0x407fefd000-0x407fffffff] [ 0.000000] memblock_x86_reserve_range: [0x407ffc0000-0x407fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00004080000000-0x0000507fffffff] [ 0.000000] 4080000000 - 5080000000 page 2M [ 0.000000] kernel direct mapping tables up to 5080000000 @ [0x507febd000-0x507fffffff] [ 0.000000] memblock_x86_reserve_range: [0x507ffc0000-0x507fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00005080000000-0x0000607fffffff] [ 0.000000] 5080000000 - 6080000000 page 2M [ 0.000000] kernel direct mapping tables up to 6080000000 @ [0x607fe7d000-0x607fffffff] [ 0.000000] memblock_x86_reserve_range: [0x607ffc0000-0x607fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00006080000000-0x0000707fffffff] [ 0.000000] 6080000000 - 7080000000 page 2M [ 0.000000] kernel direct mapping tables up to 7080000000 @ [0x707fe3d000-0x707fffffff] [ 0.000000] memblock_x86_reserve_range: [0x707ffc0000-0x707fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00007080000000-0x0000807fffffff] [ 0.000000] 7080000000 - 8080000000 page 2M [ 0.000000] kernel direct mapping tables up to 8080000000 @ [0x807fdfc000-0x807fffffff] [ 0.000000] memblock_x86_reserve_range: [0x807ffbf000-0x807fffffff] PGTABLE [ 0.000000] Initmem setup node 0 [0000000000000000-000000107fffffff] [ 0.000000] NODE_DATA [0x0000107ffbd000-0x0000107ffc1fff] [ 0.000000] Initmem setup node 1 [0000001080000000-000000207fffffff] [ 0.000000] NODE_DATA [0x0000207ffbb000-0x0000207ffbffff] [ 0.000000] Initmem setup node 2 [0000002080000000-000000307fffffff] [ 0.000000] NODE_DATA [0x0000307ffbb000-0x0000307ffbffff] [ 0.000000] Initmem setup node 3 [0000003080000000-000000407fffffff] [ 0.000000] NODE_DATA [0x0000407ffbb000-0x0000407ffbffff] [ 0.000000] Initmem setup node 4 [0000004080000000-000000507fffffff] [ 0.000000] NODE_DATA [0x0000507ffbb000-0x0000507ffbffff] [ 0.000000] Initmem setup node 5 [0000005080000000-000000607fffffff] [ 0.000000] NODE_DATA [0x0000607ffbb000-0x0000607ffbffff] [ 0.000000] Initmem setup node 6 [0000006080000000-000000707fffffff] [ 0.000000] NODE_DATA [0x0000707ffbb000-0x0000707ffbffff] [ 0.000000] Initmem setup node 7 [0000007080000000-000000807fffffff] [ 0.000000] NODE_DATA [0x0000807ffba000-0x0000807ffbefff] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933D1.9020609@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 15:48:08 -08:00
Yinghai Lu	dbef7b56d2	x86-64, numa: Allocate memnodemap under max_pfn_mapped We need to access it right way, so make sure that it is mapped already. Prepare to put page table on local node, and nodemap is used before that. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933C8.7060105@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 15:48:08 -08:00
Yinghai Lu	45635ab5e4	x86: Change get_max_mapped() to inline Move it into head file. to prepare use it in other files. [ hpa: added missing <linux/types.h> and changed type to phys_addr_t. ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933BA.8000508@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 15:47:55 -08:00
Yinghai Lu	32e3f2b00c	x86-64, gart: Fix allocation with memblock When trying to change alloc_bootmem with memblock to go with real top-down Found one old system: [ 0.000000] Node 0: aperture @ ac000000 size 64 MB [ 0.000000] Aperture pointing to e820 RAM. Ignoring. [ 0.000000] Your BIOS doesn't leave a aperture memory hole [ 0.000000] Please enable the IOMMU option in the BIOS setup [ 0.000000] This costs you 64 MB of RAM [ 0.000000] memblock_x86_reserve_range: [0x2020000000-0x2023ffffff] aperture64 [ 0.000000] Cannot allocate aperture memory hole (ffff882020000000,65536K) [ 0.000000] memblock_x86_free_range: [0x2020000000-0x2023ffffff] [ 0.000000] Kernel panic - not syncing: Not enough memory for aperture [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.37-rc5-tip-yh-06229-gb792dc2-dirty #331 [ 0.000000] Call Trace: [ 0.000000] [<ffffffff81cf50fe>] ? panic+0x91/0x1a3 [ 0.000000] [<ffffffff827c66b2>] ? gart_iommu_hole_init+0x3d7/0x4a3 [ 0.000000] [<ffffffff81d026a9>] ? _etext+0x0/0x3 [ 0.000000] [<ffffffff827ba940>] ? pci_iommu_alloc+0x47/0x71 [ 0.000000] [<ffffffff827c820b>] ? mem_init+0x19/0xec [ 0.000000] [<ffffffff827b3c40>] ? start_kernel+0x20a/0x3e8 [ 0.000000] [<ffffffff827b32cc>] ? x86_64_start_reservations+0x9c/0xa0 [ 0.000000] [<ffffffff827b33e4>] ? x86_64_start_kernel+0x114/0x11b it means __alloc_bootmem_nopanic() get too high for that aperture. Use memblock_find_in_range() with limit directly. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0C0740.90104@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 14:46:54 -08:00
Yinghai Lu	4b239f458c	x86-64, mm: Put early page table high While dubug kdump, found current kernel will have problem with crashkernel=512M. It turns out that initial mapping is to 512M, and later initial mapping to 4G (acutally is 2040M in my platform), will put page table near 512M. then initial mapping to 128g will be near 2g. before this patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x1fffc000-0x1fffffff] [ 0.000000] memblock_x86_reserve_range: [0x1fffc000-0x1fffdfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x7bc01000-0x7bc83fff] [ 0.000000] memblock_x86_reserve_range: [0x7bc01000-0x7bc7efff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] crashkernel reservation failed - No suitable area found. after patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] memblock_x86_reserve_range: [0x7f74c000-0x7f74dfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ff7d000-0x207fffafff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] memblock_x86_reserve_range: [0x17000000-0x36ffffff] CRASH KERNEL [ 0.000000] Reserving 512MB of memory at 368MB for crashkernel (System RAM: 133120MB) It means with the patch, page table for [0, 2g) will need 2g, instead of under 512M, page table for [4g, 128g) will be near 128g, instead of under 2g. That would good, if we have lots of memory above 4g, like 1024g, or 2048g or 16T, will not put related page table under 2g. that would be have chance to fill the under 2g if 1G or 2M page is not used. the code change will use add map_low_page() and update unmap_low_page() for 64bit, and use them to get access the corresponding high memory for page table setting. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0C0734.7060900@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 14:46:54 -08:00
H. Peter Anvin	d50e8fc7e3	Merge branch 'x86/apic-cleanups' into x86/numa	2010-12-29 11:36:26 -08:00
Avi Kivity	649497d1a3	KVM: MMU: Fix incorrect direct gfn for unpaged mode shadow We use the physical address instead of the base gfn for the four PAE page directories we use in unpaged mode. When the guest accesses an address above 1GB that is backed by a large host page, a BUG_ON() in kvm_mmu_set_gfn() triggers. Resolves: https://bugzilla.kernel.org/show_bug.cgi?id=21962 Reported-and-tested-by: Nicolas Prochazka <prochazka.nicolas@gmail.com> KVM-Stable-Tag. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-29 12:35:29 +02:00
Cliff Wickman	c8217b8305	x86, paravirt: Use native_halt on a halt, not native_safe_halt halt() should use native_halt() safe_halt() uses native_safe_halt() If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as static inline void halt(void) { PVOP_VCALL0(pv_irq_ops.safe_halt); } Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is static inline void halt(void) { native_halt(); } So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt() for a halt() is an oversight. Am I missing something? It probably hasn't shown up as a problem because the local apic is disabled on a shutdown or restart. But if we disable interrupts and call halt() we shouldn't expect that the halt() will re-enable interrupts. Signed-off-by: Cliff Wickman <cpw@sgi.com> LKML-Reference: <E1PSBcz-0001g1-FM@eag09.americas.sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-27 14:02:11 -08:00
Jesper Juhl	5cdd2de0a7	x86/microcode: Fix double vfree() and remove redundant pointer checks before vfree() In arch/x86/kernel/microcode_intel.c::generic_load_microcode() we have this: while (leftover) { ... if (get_ucode_data(mc, ucode_ptr, mc_size) \|\| microcode_sanity_check(mc) < 0) { vfree(mc); break; } ... } if (mc) vfree(mc); This will cause a double free of 'mc'. This patch fixes that by just removing the vfree() call in the loop since 'mc' will be freed nicely just after we break out of the loop. There's also a second change in the patch. I noticed a lot of checks for pointers being NULL before passing them to vfree(). That's completely redundant since vfree() deals gracefully with being passed a NULL pointer. Removing the redundant checks yields a nice size decrease for the object file. Size before the patch: text data bss dec hex filename 4578 240 1032 5850 16da arch/x86/kernel/microcode_intel.o Size after the patch: text data bss dec hex filename 4489 240 984 5713 1651 arch/x86/kernel/microcode_intel.o Signed-off-by: Jesper Juhl <jj@chaosbits.net> Acked-by: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-27 14:33:30 +01:00
Linus Torvalds	79534f237f	Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf probe: Fix to support libdwfl older than 0.148 perf tools: Fix lazy wildcard matching perf buildid-list: Fix error return for success perf buildid-cache: Fix symbolic link handling perf symbols: Stop using vmlinux files with no symbols perf probe: Fix use of kernel image path given by 'k' option * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, kexec: Limit the crashkernel address appropriately	2010-12-23 15:39:40 -08:00
David Rientjes	a387e95a49	x86, numa: Fix cpu to node mapping for sparse node ids NUMA boot code assumes that physical node ids start at 0, but the DIMMs that the apic id represents may not be reachable. If this is the case, node 0 is never online and cpus never end up getting appropriately assigned to a node. This causes the cpumask of all online nodes to be empty and machines crash with kernel code assuming online nodes have valid cpus. The fix is to appropriately map all the address ranges for physical nodes and ensure the cpu to node mapping function checks all possible nodes (up to MAX_NUMNODES) instead of simply checking nodes 0-N, where N is the number of physical nodes, for valid address ranges. This requires no longer "compressing" the address ranges of nodes in the physical node map from 0-N, but rather leave indices in physnodes[] to represent the actual node id of the physical node. Accordingly, the topology exported by both amd_get_nodes() and acpi_get_nodes() no longer must return the number of nodes to iterate through; all such iterations will now be to MAX_NUMNODES. This change also passes the end address of system RAM (which may be different from normal operation if mem= is specified on the command line) before the physnodes[] array is populated. ACPI parsed nodes are truncated to fit within the address range that respect the mem= boundaries and even some physical nodes may become unreachable in such cases. When NUMA emulation does succeed, any apicid to node mapping that exists for unreachable nodes are given default values so that proximity domains can still be assigned. This is important for node_distance() to function as desired. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221702090.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:16 -08:00
David Rientjes	c1c3443c9c	x86, numa: Fake node-to-cpumask for NUMA emulation It's necessary to fake the node-to-cpumask mapping so that an emulated node ID returns a cpumask that includes all cpus that have affinity to the memory it represents. This is a little intrusive because it requires knowledge of the physical topology of the system. setup_physnodes() gives us that information, but since NUMA emulation ends up altering the physnodes array, it's necessary to reset it before cpus are brought online. Accordingly, the physnodes array is moved out of init.data and into cpuinit.data since it will be needed on cpuup callbacks. This works regardless of whether numa=fake is used on the command line, or the setup of the fake node succeeds or fails. The physnodes array always contains the physical topology of the machine if CONFIG_NUMA_EMU is enabled and can be used to setup the correct node-to-cpumask mappings in all cases since setup_physnodes() is called whenever the array needs to be repopulated with the correct data. To fake the actual mappings, numa_add_cpu() and numa_remove_cpu() are rewritten for CONFIG_NUMA_EMU so that we first find the physical node to which each cpu has local affinity, then iterate through all online nodes to find the emulated nodes that have local affinity to that physical node, and then finally map the cpu to each of those emulated nodes. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221701520.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:15 -08:00
David Rientjes	f51bf3073a	x86, numa: Fake apicid and pxm mappings for NUMA emulation This patch adds the equivalent of acpi_fake_nodes() for AMD Northbridge platforms. The goal is to fake the apicid-to-node mappings for NUMA emulation so the physical topology of the machine is correctly maintained within the kernel. This change also fakes proximity domains for both ACPI and k8 code so the physical distance between emulated nodes is maintained via node_distance(). This exports the correct distances via /sys/devices/system/node/.../distance based on the underlying topology. A new helper function, fake_physnodes(), is introduced to correctly invoke the correct NUMA code to fake these two mappings based on the system type. If there is no underlying NUMA configuration, all cpus are mapped to node 0 for local distance. Since acpi_fake_nodes() is no longer called with CONFIG_ACPI_NUMA, it's prototype can be removed from the header file for such a configuration. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221701360.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:14 -08:00
David Rientjes	4e76f4e67a	x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU Both acpi_get_nodes() and amd_get_nodes() are only necessary when CONFIG_NUMA_EMU is enabled, so avoid compiling them when the option is disabled. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221701210.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:12 -08:00
David Rientjes	34dc9e7496	x86, numa: Reduce minimum fake node size to 32M This patch changes the minimum fake node size from 64MB to 32MB so it is possible to test NUMA code at a greater scale on smaller machines (64 nodes on a 2G machine, 1024 nodes on 32G machine with CONFIG_NODES_SHIFT=10). Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221700590.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:10 -08:00
Yinghai Lu	d3bd058826	x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation Recent Intel new system have different order in MADT, aka will list all thread0 at first, then all thread1. But SRAT table still old order, it will list cpus in one socket all together. If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed to put some cpus apic id to node mapping into apicid_to_node[]. for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash... [ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS). [ 9.235021] divide error: 0000 [#1] SMP [ 9.235315] last sysfs file: [ 9.235481] CPU 1 [ 9.235592] Modules linked in: [ 9.245398] [ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800 [ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 ... [ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 [ 9.665356] RSP <ffff88103f8d1c40> [ 9.665568] ---[ end trace 2296156d35fdfc87 ]--- So let just parse all cpu entries in SRAT. Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of apicid_to_node[]. it fixes following bug too. https://bugzilla.kernel.org/show_bug.cgi?id=22662 -v2: expand to 32bit according to hpa need to add MAX_LOCAL_APIC for 32bit Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com> Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Tested-by: Myron Stowe <myron.stowe@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0AD486.9020704@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 13:16:18 -08:00
Yinghai Lu	56d91f132c	x86, acpi: Add MAX_LOCAL_APIC for 32bit We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number of local apics. Also apic_version[] array should use MAX_LOCAL_APICs. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0AD464.2020408@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 13:15:53 -08:00
Seth Heasley	9b444b36fe	x86/PCI: irq and pci_ids patch for Intel Patsburg This patch adds an additional LPC Controller DeviceID for the Intel Patsburg PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-23 12:53:10 -08:00
Ingo Molnar	26e20a108c	Merge commit 'v2.6.37-rc7' into x86/security	2010-12-23 09:48:41 +01:00
Don Zickus	4a7863cc2e	x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR The x86 arch has shifted its use of the nmi_watchdog from a local implementation to the global one provide by kernel/watchdog.c. This shift has caused a whole bunch of compile problems under different config options. I attempt to simplify things with the patch below. In order to simplify things, I had to come to terms with the meaning of two terms ARCH_HAS_NMI_WATCHDOG and CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing, the former on a local level and the latter on a global level. With the old x86 nmi watchdog gone, there is no need to rely on defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't make sense any more. x86 will now use the global implementation. The changes below do a few things. First it changes the few places that relied on ARCH_HAS_NMI_WATCHDOG to use CONFIG_X86_LOCAL_APIC (the former was an alias for the latter anyway, so nothing unusual here). Those pieces of code were relying more on local apic functionality the nmi watchdog functionality, so the change should make sense. Second, I removed the x86 implementation of touch_nmi_watchdog(). It isn't need now, instead x86 will rely on kernel/watchdog.c's implementation. Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from x86. And tweaked the include/linux/nmi.h file to tell users to look for an externally defined touch_nmi_watchdog in the case of ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This changes removes some of the ugliness in that file. Finally, I added a Kconfig dependency for CONFIG_HARDLOCKUP_DETECTOR that said you can't have ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can only have one nmi_watchdog. Tested with ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig, (various broken configs) Hopefully, after this patch I won't get any more compile broken emails. :-) v3: changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: fweisbec@gmail.com LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-22 22:15:32 +01:00
Jiri Kosina	4b7bd36470	Merge branch 'master' into for-next Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.	2010-12-22 18:57:02 +01:00
Jack Steiner	d8850ba425	x86, UV: Fix the effect of extra bits in the hub nodeid register UV systems can be partitioned into multiple independent SSIs. Large partitioned systems may have extra bits in the node_id register. These bits are used when the total memory on all SSIs exceeds 16TB. These extra bits need to be ignored when calculating x2apic_extra_bits. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20101130195926.972776133@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-22 12:31:15 +01:00
Jack Steiner	e681041388	x86, UV: Add common uv_early_read_mmr() function for reading MMRs Early in boot, reading MMRs from the UV hub controller require calls to early_ioremap()/early_iounmap(). Rather than duplicating code, add a common function to do the map/read/unmap. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20101130195926.834804371@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-22 12:31:15 +01:00
Ingo Molnar	6c529a266b	Merge commit 'v2.6.37-rc7' into perf/core Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-22 11:53:23 +01:00
Linus Torvalds	55ec86f848	Merge branches 'x86-fixes-for-linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure we can map all of lowmem if we need to x86, vt-d: Handle previous faults after enabling fault handling x86: Enable the intr-remap fault handling after local APIC setup x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem() bootmem: Add alloc_bootmem_align() x86, gcc-4.6: Use gcc -m options when building vdso x86: HPET: Chose a paranoid safe value for the ETIME check x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix off by one in perf_swevent_init() perf: Fix duplicate events with multiple-pmu vs software events ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO scripts/tags.sh: Add magic for trace-events tracing: Fix panic when lseek() called on "trace" opened for writing	2010-12-19 10:44:54 -08:00
Robert Richter	da169f5df2	oprofile, x86: Add support for 6 counters (AMD family 15h) This patch adds support for up to 6 hardware counters for AMD family 15h cpus. There is a new MSR range for hardware counters beginning at MSRC001_0200 Performance Event Select (PERF_CTL0). Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-12-19 11:43:08 +01:00
Robert Richter	30570bced1	oprofile, x86: Add support for AMD family 15h This patch adds support for AMD family 15h (Interlagos/Valencia/ Zambezi) cpus. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-12-19 11:43:04 +01:00
Linus Torvalds	46bdfe6a50	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86: avoid high BIOS area when allocating address space x86: avoid E820 regions when allocating address space x86: avoid low BIOS area when allocating address space resources: add arch hook for preventing allocation in reserved areas Revert "resources: support allocating space within a region from the top down" Revert "PCI: allocate bus resources from the top down" Revert "x86/PCI: allocate space from the end of a region, not the beginning" Revert "x86: allocate space within a region top-down" Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode" PCI: Update MCP55 quirk to not affect non HyperTransport variants	2010-12-18 10:13:24 -08:00
Tejun Heo	05c2d088d0	Merge branch 'this_cpu_ops' into for-2.6.38	2010-12-18 15:54:36 +01:00
Christoph Lameter	8270137a0d	cpuops: Use cmpxchg for xchg to avoid lock semantics Use cmpxchg instead of xchg to realize this_cpu_xchg. xchg will cause LOCK overhead since LOCK is always implied but cmpxchg will not. Baselines: xchg() = 18 cycles (no segment prefix, LOCK semantics) __this_cpu_xchg = 1 cycle (simulated using this_cpu_read/write, two prefixes. Looks like the cpu can use loop optimization to get rid of most of the overhead) Cycles before: this_cpu_xchg = 37 cycles (segment prefix and LOCK (implied by xchg)) After: this_cpu_xchg = 11 cycle (using cmpxchg without lock semantics) Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-18 15:54:04 +01:00
Christoph Lameter	7296e08aba	x86: this_cpu_cmpxchg and this_cpu_xchg operations Provide support as far as the hardware capabilities of the x86 cpus allow. Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for fast cpuops implementations. V1->V2: - Take out the definition for this_cpu_cmpxchg_8 and move it into a separate patch. tj: - Reordered ops to better follow this_cpu_* organization. - Renamed macro temp variables similar to their existing neighbours. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-18 15:54:04 +01:00
H. Peter Anvin	7f8595bfac	x86, kexec: Limit the crashkernel address appropriately Keep the crash kernel address below 512 MiB for 32 bits and 896 MiB for 64 bits. For 32 bits, this retains compatibility with earlier kernel releases, and makes it work even if the vmalloc= setting is adjusted. For 64 bits, we should be able to increase this substantially once a hard-coded limit in kexec-tools is fixed. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20101217195035.GE14502@redhat.com>	2010-12-17 15:04:00 -08:00
Bjorn Helgaas	a2c606d53a	x86: avoid high BIOS area when allocating address space This prevents allocation of the last 2MB before 4GB. The experiment described here shows Windows 7 ignoring the last 1MB: https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27 This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin says "There will be ROM at the top of the 32-bit address space; it's a fact of the architecture, and on at least older systems it was common to have a shadow 1 MiB below." Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:01:30 -08:00
Bjorn Helgaas	4dc2287c18	x86: avoid E820 regions when allocating address space When we allocate address space, e.g., to assign it to a PCI device, don't allocate anything mentioned in the BIOS E820 memory map. On recent machines (2008 and newer), we assign PCI resources from the windows described by the ACPI PCI host bridge _CRS. On many Dell machines, these windows overlap some E820 reserved areas, e.g., BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved) pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff] If we put devices at 0xbff00000, they don't work, probably because that's really RAM, not I/O memory. This patch prevents that by removing the 0xbfe4dc00-0xbfffffff area from the "available" resource. I'm not very happy with this solution because Windows solves the problem differently (it seems to ignore E820 reserved areas and it allocates top-down instead of bottom-up; details at comment 45 of the bugzilla below). That means we're vulnerable to BIOS defects that Windows would not trip over. For example, if BIOS described a device in ACPI but didn't mention it in E820, Windows would work fine but Linux would fail. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228 Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:01:24 -08:00
Bjorn Helgaas	30919b0bf3	x86: avoid low BIOS area when allocating address space This implements arch_remove_reservations() so allocate_resource() can avoid any arch-specific reserved areas. This currently just avoids the BIOS area (the first 1MB), but could be used for E820 reserved areas if that turns out to be necessary. We previously avoided this area in pcibios_align_resource(). This patch moves the test from that PCI-specific path to a generic path, so all resource allocations will avoid this area. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:01:17 -08:00
Bjorn Helgaas	d14125ecfe	Revert "x86/PCI: allocate space from the end of a region, not the beginning" This reverts commit `dc9887dc02`. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:00:49 -08:00
Bjorn Helgaas	5e52f1c5e8	Revert "x86: allocate space within a region top-down" This reverts commit `1af3c2e45e`. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-12-17 10:00:43 -08:00
Linus Torvalds	a6ac1f0af4	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: Fix preemption counter leak in kvm_timer_init() KVM: enlarge number of possible CPUID leaves KVM: SVM: Do not report xsave in supported cpuid KVM: Fix OSXSAVE after migration	2010-12-17 09:32:39 -08:00
Tejun Heo	403047754c	percpu,x86: relocate this_cpu_add_return() and friends - include/linux/percpu.h: this_cpu_add_return() and friends were located next to __this_cpu_add_return(). However, the overall organization is to first group by preemption safeness. Relocate this_cpu_add_return() and friends to preemption-safe area. - arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after other more basic operations. Relocate [__]this_cpu_add_return_8() so that they're first grouped by preemption safeness. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com>	2010-12-17 16:13:22 +01:00
Tejun Heo	275c8b9328	Merge branch 'this_cpu_ops' into for-2.6.38	2010-12-17 15:16:46 +01:00
Christoph Lameter	8f1d97c79e	x86: Support for this_cpu_add, sub, dec, inc_return Supply an implementation for x86 in order to generate more efficient code. V2->V3: - Cleanup - Remove strange type checking from percpu_add_return_op. tj: - Dropped unused typedef from percpu_add_return_op(). - Renamed ret__ to paro_ret__ in percpu_add_return_op(). - Minor indentation adjustments. Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-17 15:15:28 +01:00
Christoph Lameter	780f36d8b3	xen: Use this_cpu_ops Use this_cpu_ops to reduce code size and simplify things in various places. V3->V4: Move instance of this_cpu_inc_return to a later patchset so that this patch can be applied without infrastructure changes. Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-17 15:07:19 +01:00
Christoph Lameter	b76834bc1b	kprobes: Use this_cpu_ops Use this_cpu ops in various places to optimize per cpu data access. Cc: Jason Baron <jbaron@redhat.com> Cc: Namhyung Kim <namhyung@gmail.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2010-12-17 15:07:19 +01:00
H. Peter Anvin	147dd5610c	x86-32: Make sure we can map all of lowmem if we need to A relocatable kernel can be anywhere in lowmem -- and in the case of a kdump kernel, is likely to be fairly high. Since the early page tables map everything from address zero up we need to make sure we allocate enough brk that we can map all of lowmem if we need to. Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Tested-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0AD3ED.8070607@kernel.org>	2010-12-16 19:11:09 -08:00
Avi Kivity	3e26f23091	KVM: Fix preemption counter leak in kvm_timer_init() Based on a patch from Thomas Meyer. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-16 12:39:31 +02:00
Peter Zijlstra	7639dae0ca	perf, x86: Provide a PEBS capable cycle event Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-16 11:36:44 +01:00
Peter Zijlstra	2e80a82a49	perf: Dynamic pmu types Extend the perf_pmu_register() interface to allow for named and dynamic pmu types. Because we need to support the existing static types we cannot use dynamic types for everything, hence provide a type argument. If we want to enumerate the PMUs they need a name, provide one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222056.259707703@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-16 11:36:43 +01:00
Peter Zijlstra	4407204c5c	perf, x86: Detect broken BIOSes that corrupt the PMU Some BIOSes use PMU resources, which can cause various bugs: - Non-working or erratic PMU based statistics - the PMU can end up counting the wrong thing, resulting in misleading statistics - Profiling can stop working or it can profile the wrong thing - A non-working or erratic NMI watchdog that cannot be relied on - The kernel may disturb whatever thing the BIOS tries to use the PMU for - possibly causing hardware malfunction in extreme cases. - ... and other forms of potential misbehavior Various forms of such misbehavior has been observed in practice - there are BIOSes that just corrupt the PMU state, consequences be damned. The PMU is a CPU resource that is handled by the kernel and the BIOS stealing+corrupting it is not acceptable nor robust, so we detect it, warn about it and further refuse to touch the PMU ourselves. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-16 11:36:42 +01:00
Ingo Molnar	006b20fe4c	Merge branch 'perf/urgent' into perf/core Merge reason: We want to apply a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-16 11:22:27 +01:00
Rusty Russell	da32dac101	lguest: populate initial_page_table Two x86 patches broke lguest: 1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator. In lguest, the host places linear page tables at the top of mem, which used to be enough to get us up to the swapper_pg_dir page tables. With the first patch, the direct mapping tables used that memory: Before: kernel direct mapping tables up to 4000000 @ 7000-1a000 After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000 I initially fixed this by lying about the amount of memory we had, so the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then... 2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table. This was initialized in a part of head_32.S which isn't executed by lguest; it is then copied into swapper_pg_dir. So we have to initialize it; and anyway we switch to it before we blatt the old tables, so that fixes the previous damage as well. For the moment, I cut & pasted the code into lguest's boot code, but next merge window I will merge them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> To: x86@kernel.org	2010-12-16 17:03:15 +10:30
Rusty Russell	bb4093deb2	lguest: restore boot speed lguest is dumb and drops all the pagetables for set_pte (which is only used for kernel mapping manipulation, so it's OK without highmem). But it's used a lot in boot, too. As a guest optimization, we suppressed this flushing until the first page switch. Now we have initial_page_table, that happens much earlier, so extend the heuristic to wait until we switch to something other than the swapper_pg_dir or initial_page_table. As measured on my laptop under kvm, this dropped the time-to-mount-root from 48 seconds to 4.3 seconds. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2010-12-16 17:03:15 +10:30
Rusty Russell	bb6f1d9a99	lguest: fix crash lguest_time_init `fe25c7fc2e` "x86: lguest: Convert to new irq chip functions" converted enable_lguest_irq() to take a struct irq_data *, but didn't fix the one internal caller. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> To: x86@kernel.org	2010-12-16 17:03:14 +10:30
Andres Salomon	b5318d302f	x86, olpc: Speed up device tree creation during boot Calling alloc_bootmem() for tiny chunks of memory over and over is really slow; on an XO-1, it caused the time between when the kernel started booting and when the display came alive (post-lxfb probe) to increase to 44s. This patch optimizes the prom_early_alloc function by calling alloc_bootmem for 4k-sized blocks of memory, and handing out chunks of that to callers. With this patch, the time between kernel load and display initialization decreased to 23s. If there's a better way to do this early in the boot process, please let me know. (Note: increasing the chunk size to 16k didn't noticably affect boot time, and wasted 9k.) v4: clarify comment, requested by hpa v3: fix wasted memory buglet found by Milton Miller, and style fix. v2: reorder prom_early_alloc as suggested by Grant. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20101129153951.74202a84@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-15 17:11:40 -08:00
Andres Salomon	c10d1e260f	x86, olpc: Add OLPC device-tree support Make use of PROC_DEVICETREE to export the tree, and sparc's PROMTREE code to call into OLPC's Open Firmware to build the tree. v5: fix buglet with root node check (introduced in v4) v4: address some minor style issues pointed out by Grant, and explicitly cast negative phandle checks to s32. v3: rename olpc_prom to olpc_dt - rework Kconfig entries - drop devtree build hook from proc, instead adding a call to x86's paging_init (similarly to how sparc64 does it) - switch allocation from using slab to alloc_bootmem. this allows the DT to be built earlier during boot (during setup_arch); the downside is that there are some 1200 bootmem reservations that are done during boot. Not ideal.. - add a helper olpc_ofw_is_installed function to test for the existence and successful detection of OLPC's OFW. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20101116220952.26526a80@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-15 17:11:30 -08:00
Andres Salomon	4722d194e6	x86, of: Define irq functions to allow drivers/of/* to build on x86 - Define a stub irq_create_of_mapping for x86 as a stop-gap solution until drivers/of/irq is further along. - Define irq_dispose_mapping for x86 to appease of_i2c.c These are needed to allow stuff in drivers/of/ to build on x86. This stuff will eventually get replaced; quoting Grant, "The long term plan is to have the drivers/of/ code handling the mapping intelligently like powerpc currently does." But for now, just provide these functions. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20101111214526.5de7121b@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-15 17:11:16 -08:00
Randy Dunlap	52f6c5ad43	crypto: ghash-intel - ghash-clmulni-intel_glue needs err.h Add missing header file: arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR' arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-12-15 19:44:08 +08:00
Kenji Kaneshige	7f7fbf45c6	x86: Enable the intr-remap fault handling after local APIC setup Interrupt-remapping gets enabled very early in the boot, as it determines the apic mode that the processor can use. And the current code enables the vt-d fault handling before the setup_local_APIC(). And hence the APIC LDR registers and data structure in the memory may not be initialized. So the vt-d fault handling in logical xapic/x2apic modes were broken. Fix this by enabling the vt-d fault handling in the end_local_APIC_setup() A cleaner fix of enabling fault handling while enabling intr-remapping will be addressed for v2.6.38. [ Enabling intr-remapping determines the usage of x2apic mode and the apic mode determines the fault-handling configuration. ] Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <20101201062244.541996375@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-13 16:53:32 -08:00
Kenji Kaneshige	086e8ced65	x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode In x2apic mode, we need to set the upper address register of the fault handling interrupt register of the vt-d hardware. Without this irq migration of the vt-d fault handling interrupt is broken. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: stable@kernel.org [v2.6.32+] Acked-by: Chris Wright <chrisw@sous-sol.org> Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-13 16:52:52 -08:00
Suresh Siddha	3fb82d56ad	x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume During suspend, we disable all the non boot cpus. And during resume we bring them all back again. So no need to do alternatives_smp_switch() in between. On my core 2 based laptop, this speeds up the suspend path by 15msec and the resume path by 5 msec (suspend/resume speed up differences can be attributed to the different P-states that the cpu is in during suspend/resume). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-13 16:23:56 -08:00
Suresh Siddha	10340ae130	x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem() Alignment of alloc_bootmem() depends on the value of L1_CACHE_SHIFT. What we need here, however, is 64 byte alignment. Use alloc_bootmem_align() and explicitly specify the alignment instead. This fixes a kernel boot crash reported by Jody when the cpu in .config is set to MPENTIUMII but the kernel is booted on a xsave-capable CPU. Reported-by: Jody Bruchon <jody@nctritech.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20101116212442.059967454@sbsiddha-MOBL3.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@kernel.org>	2010-12-13 16:13:11 -08:00
H. Peter Anvin	de2a8cf98e	x86, gcc-4.6: Use gcc -m options when building vdso The vdso Makefile passes linker-style -m options not to the linker but to gcc. This happens to work with earlier gcc, but fails with gcc 4.6. Pass gcc-style -m options, instead. Note: all currently supported versions of gcc supports -m32, so there is no reason to conditionalize it any more. Reported-by: H. J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <tip-*@git.kernel.org> Cc: <stable@kernel.org>	2010-12-13 16:08:37 -08:00
Don Zickus	5f29805a4f	x86, watchdog: Compile fix when CONFIG_LOCAL_APIC not enabled When adjusting the code to handle removing the old nmi watchdog, I forgot to consider the compile case when the local apic is not enabled. This change fixes the following build error: arch/x86/kernel/apic/hw_nmi.c:28:6: error: redefinition of ‘touch_nmi_watchdog’ Signed-off-by: Don Zickus <dzickus@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Rakib Mullick <rakib.mullick@gmail.com> LKML-Reference: <20101213153719.GD18577@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-13 18:23:23 +01:00
Thomas Gleixner	f1c18071ad	x86: HPET: Chose a paranoid safe value for the ETIME check commit `995bd3bb5` (x86: Hpet: Avoid the comparator readback penalty) chose 8 HPET cycles as a safe value for the ETIME check, as we had the confirmation that the posted write to the comparator register is delayed by two HPET clock cycles on Intel chipsets which showed readback problems. After that patch hit mainline we got reports from machines with newer AMD chipsets which seem to have an even longer delay. See http://thread.gmane.org/gmane.linux.kernel/1054283 and http://thread.gmane.org/gmane.linux.kernel/1069458 for further information. Boris tried to come up with an ACPI based selection of the minimum HPET cycles, but this failed on a couple of test machines. And of course we did not get any useful information from the hardware folks. For now our only option is to chose a paranoid high and safe value for the minimum HPET cycles used by the ETIME check. Adjust the minimum ns value for the HPET clockevent accordingly. Reported-Bistected-and-Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6> Cc: Simon Kirby <sim@hostway.ca> Cc: Borislav Petkov <bp@alien8.de> Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com> Cc: John Stultz <johnstul@us.ibm.com>	2010-12-13 13:42:44 +01:00
Tadeusz Struk	3c097b8008	crypto: aesni-intel - Fixed build with binutils 2.16 This patch fixes the problem with 2.16 binutils. Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com> Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-12-13 19:51:15 +08:00
Thomas Gleixner	a8760eca6c	x86: Check tsc available/disabled in the delayed init function The delayed TSC init function does not check whether the system has no TSC or TSC is disabled at the kernel command line, which results in a crash in the work queue based extended calibration due to division by zero because the basic calibration never happened. Add the missing checks and do not touch TSC when not available or disabled. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <johnstul@us.ibm.com>	2010-12-13 11:35:05 +01:00
Tejun Heo	0aa002fe60	x86: apic: Cleanup and simplify setup_local_APIC() setup_local_APIC() is used to setup local APIC early during CPU initialization and already assumes that preemption is disabled on entry. However, The function unnecessarily disables and enables preemption and uses smp_processor_id() multiple times in and out of the nested preemption disabled section. This gives the wrong impression that the function might be able to handle being called with preemption enabled and/or migrated to another processor in the middle. Make it clear that the function is always called with preemption disabled, drop the confusing preemption disable block and call smp_processor_id() once at the beginning of the function. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: brgerst@gmail.com LKML-Reference: <4D00B3B9.7060702@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-10 13:46:26 +01:00
Don Zickus	5dc3055879	x86, NMI: Add back unknown_nmi_panic and nmi_watchdog sysctls Originally adapted from Huang Ying's patch which moved the unknown_nmi_panic to the traps.c file. Because the old nmi watchdog was deleted before this change happened, the unknown_nmi_panic sysctl was lost. This re-adds it. Also, the nmi_watchdog sysctl was re-implemented and its documentation updated accordingly. Patch-inspired-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: fweisbec@gmail.com LKML-Reference: <1291068437-5331-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-10 00:01:06 +01:00
Don Zickus	96a84c20d6	lockup detector: Compile fixes from removing the old x86 nmi watchdog My patch that removed the old x86 nmi watchdog broke other arches. This change reverts a piece of that patch and puts the change in the correct spot. Signed-off-by: Don Zickus <dzickus@redhat.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: fweisbec@gmail.com Cc: yinghai@kernel.org LKML-Reference: <1291068437-5331-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-10 00:01:06 +01:00
Feng Tang	0e3fa13f4e	x86: Further simplify mp_irq info handling assign_to_mp_irq() is copying the struct mpc_intsrc members one by one. That's silly. Use memcpy() and let the compiler figure it out. Same for the identical function assign_to_mpc_intsrc() mp_irq_mpc_intsrc_cmp() is comparing the struct members one by one, but no caller ever checks the different return codes. Use memcmp() instead. Remove the extra printk in MP_ioapic_info() Signed-off-by: Feng Tang <feng.tang@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Alan Cox <alan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <20101208151857.212f0018@feng-i7> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:06 +01:00
Feng Tang	2d8009ba67	x86: Unify 3 similar ways of saving mp_irqs info There are 3 places defining similar functions of saving IRQ vector info into mp_irqs[] array: mmparse/acpi/mrst. Replace the redundant code by a common function in io_apic.c as it's only called when CONFIG_X86_IO_APIC=y Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Alan Cox <alan@linux.intel.com> Cc: Len Brown <len.brown@intel.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20101207133204.4d913c5a@feng-i7> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:06 +01:00
Yinghai Lu	60d79fd99f	x86, ioapic: Avoid writing io_apic id if already correct For 32bit mptable path, setup_ids_from_mpc() always writes the io_apic id register, even there is no change needed. Skip the write, when readout and mptable match. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> LKML-Reference: <4CFDF785.7010401@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	0450193bff	x86, x2apic: Don't map lapic addr for preenabled x2apic systems If x2apic is preenabled and used by the kernel, we don't need to map the lapic address. That mapping will never be used. So just skip that in register_lapic_address() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF69C.9070501@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	53301f36f3	x86, sfi: Use register_lapic_address() register_lapic_address() and mp_sfi_register_lapic_address() are almost identical. Use the common function. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4CFDF693.6000908@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	326a2e6bae	x86, apic: Use register_lapic_address() in init_apic_mapping() Remove the printk as well, we don't want to print when nothing changed. We print in register_lapic_address() already. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF68A.7020902@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:05 +01:00
Yinghai Lu	f115714163	x86, apic: Remove early_init_lapic_mapping() It is almost the same as smp_register_lapic_addr(). We just need to let smp_read_mpc() call smp_register_lapic_addr() when early==1. Add the apic_printk to smp_register_lapic_address() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF681.3030509@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:04 +01:00
Yinghai Lu	c0104d38a7	x86, apic: Unify identical register_lapic_address() functions They are the same, move the common function to apic.c to allow further cleanups. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Len Brown <lenb@kernel.org> LKML-Reference: <4CFDF675.4060305@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:04 +01:00
Thomas Gleixner	51ddafcbc7	Merge branch 'x86/platform' into x86/apic-cleanups Reason: apic cleanup series depends on x86/apic, x86/amd-nb and x86/platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 18:19:21 +01:00
Thomas Gleixner	d834a9dcec	Merge branch 'x86/amd-nb' into x86/apic-cleanups Reason: apic cleanup series depends on x86/apic, x86/amd-nb x86/platform Conflicts: arch/x86/include/asm/io_apic.h Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 18:17:25 +01:00
Thomas Gleixner	4720dd1b38	x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n arch/x86/kernel/apic/io_apic.c: In function 'ack_apic_level': arch/x86/kernel/apic/io_apic.c:2433: warning: unused variable 'desc' Signed-off-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <201010272107.o9RL7rse018212@imap1.linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 17:43:21 +01:00
Rakib Mullick	2c6cb1053a	x86: Address 'unused' warning in hw_nmi.c again arch/x86/kernel/apic/hw_nmi.c:29: warning: backtrace_mask defined but not used commit 0e2af2a9(x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG) addressed this warning, but it was reintroduced by commit 5f2b0ba4(x86, nmi_watchdog: Remove the old nmi_watchdog). Move backtrace_mask into the #ifdef arch_trigger_all_cpu_backtrace section again. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <AANLkTi=rcc38QzoKa6LFy4m++-p_9=Zt4_kDQE=GeKxf@mail.gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 16:06:52 +01:00
Peter Zijlstra	c079c791c5	perf, amd: Remove the nb lock Since all the hotplug stuff is serialized by the hotplug mutex, do away with the amd_nb_lock. Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-08 20:16:30 +01:00
Andre Przywara	73c1160ce3	KVM: enlarge number of possible CPUID leaves Currently the number of CPUID leaves KVM handles is limited to 40. My desktop machine (AthlonII) already has 35 and future CPUs will expand this well beyond the limit. Extend the limit to 80 to make room for future processors. KVM-Stable-Tag. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:38 +02:00
Joerg Roedel	24d1b15f72	KVM: SVM: Do not report xsave in supported cpuid To support xsave properly for the guest the SVM module need software support for it. As long as this is not present do not report the xsave as supported feature in cpuid. As a side-effect this patch moves the bit() helper function into the x86.h file so that it can be used in svm.c too. KVM-Stable-Tag. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:37 +02:00
Sheng Yang	3ea3aa8cf6	KVM: Fix OSXSAVE after migration CPUID's OSXSAVE is a mirror of CR4.OSXSAVE bit. We need to update the CPUID after migration. KVM-Stable-Tag. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-12-08 17:28:31 +02:00
Linus Torvalds	6313e3c217	Merge branches 'x86-fixes-for-linus', 'perf-fixes-for-linus' and 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/pvclock: Zero last_value on resume * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf record: Fix eternal wait for stillborn child perf header: Don't assume there's no attr info if no sample ids is provided perf symbols: Figure out start address of kernel map from kallsyms perf symbols: Fix kallsyms kernel/module map splitting * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: Fix printk_needs_cpu() return value on offline cpus printk: Fix wake_up_klogd() vs cpu hotplug	2010-12-08 06:40:59 -08:00
Ingo Molnar	10a18d7dc0	Merge commit 'v2.6.37-rc5' into perf/core Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-07 07:49:51 +01:00
Feng Tang	991cfffa7c	x86, earlyprintk: Move mrst early console to platform/ and fix a typo Move the code to arch/x86/platform/mrst/. Also fix a typo to use the correct config option: ONFIG_EARLY_PRINTK_MRST Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: alan@linux.intel.com LKML-Reference: <1291348298-21263-1-git-send-email-feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 20:52:04 +01:00
Masami Hiramatsu	f984ba4eb5	kprobes: Use text_poke_smp_batch for unoptimizing Use text_poke_smp_batch() on unoptimization path for reducing the number of stop_machine() issues. If the number of unoptimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes unoptimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20101203095434.2961.22657.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:32 +01:00
Masami Hiramatsu	cd7ebe2298	kprobes: Use text_poke_smp_batch for optimizing Use text_poke_smp_batch() in optimization path for reducing the number of stop_machine() issues. If the number of optimizing probes is more than MAX_OPTIMIZE_PROBES(=256), kprobes optimizes first MAX_OPTIMIZE_PROBES probes and kicks optimizer for remaining probes. Changes in v5: - Use kick_kprobe_optimizer() instead of directly calling schedule_delayed_work(). - Rescheduling optimizer outside of kprobe mutex lock. Changes in v2: - Allocate code buffer and parameters in arch_init_kprobes() instead of using static arraies. - Merge previous max optimization limit patch into this patch. So, this patch introduces upper limit of optimization at once. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20101203095428.2961.8994.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:31 +01:00
Masami Hiramatsu	7deb18dcf0	x86: Introduce text_poke_smp_batch() for batch-code modifying Introduce text_poke_smp_batch(). This function modifies several text areas with one stop_machine() on SMP. Because calling stop_machine() is heavy task, it is better to aggregate text_poke requests. ( Note: I've talked with Rusty about this interface, and he would not like to expand stop_machine() interface, since it is not for generic use. ) Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095422.2961.51217.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:31 +01:00
Masami Hiramatsu	6274de4984	kprobes: Support delayed unoptimizing Unoptimization occurs when a probe is unregistered or disabled, and is heavy because it recovers instructions by using stop_machine(). This patch delays unoptimization operations and unoptimize several probes at once by using text_poke_smp_batch(). This can avoid unexpected system slowdown coming from stop_machine(). Changes in v5: - Split this patch into several cleanup patches and this patch. - Fix some text_mutex lock miss. - Use bool instead of int for behavior flags. - Add additional comment for (un)optimizing path. Changes in v2: - Use dynamic allocated buffers and params. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: 2nddept-manager@sdl.hitachi.co.jp LKML-Reference: <20101203095409.2961.82733.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-06 17:59:30 +01:00
Feng Tang	e4d2ebcab1	x86, apbt: Setup affinity for apb timers acting as per-cpu timer Commit `a5ef2e70` "x86: Sanitize apb timer interrupt handling" forgot the affinity setup when cleaning up the code, this patch just adds the forgotten part Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: Jacob Pan <jacob.jun.pan@intel.com> Cc: Alan Cox <alan@linux.intel.com> LKML-Reference: <1291348298-21263-2-git-send-email-feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 15:58:26 +01:00
Dirk Brandewie	5ec6960f6f	ce4100: Add errata fixes for UART on CE4100 This patch enables the UART on the CE4100. The UART has a couple of issues that need to be worked around. First the UART is mostly PC compatible except that it is clocked eight times faster than a standard PC so the default configuration provided in arch/x86/include/asm/serial.h needs to be overridden. Second the TX interrupt may not be set correctly all the time. Lastly accessing the UART via I/O space for early_prink() hangs the chip when the IOAPIC is enabled. A custom mem_serial_in() is provided to work around the TX interrupt issue. The configuration issues are dealt with in the call back registered with the 8250 driver via serial8250_set_isa_configurator() Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> LKML-Reference: <1290436128-17958-1-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 15:58:26 +01:00
Sebastian Andrzej Siewior	a38c5380ef	x86: io_apic: Split setup_ioapic_ids_from_mpc() Sodaville needs to setup the IO_APIC ids as the boot loader leaves them uninitialized. Split out the setter function so it can be called unconditionally from the sodaville board code. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20101126165020.GA26361@www.tglx.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-06 14:30:28 +01:00
Linus Torvalds	11e8896474	Merge branch '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * '2.6.37-rc4-pvhvm-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: unplug the emulated devices at resume time xen: fix save/restore for PV on HVM guests with pirq remapping xen: resume the pv console for hvm guests too xen: fix MSI setup and teardown for PV on HVM guests xen: use PHYSDEVOP_get_free_pirq to implement find_unbound_pirq	2010-12-03 11:30:57 -08:00
Linus Torvalds	8338fded13	Merge branches 'upstream/core' and 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: allocate irq descs on any NUMA node xen: prevent crashes with non-HIGHMEM 32-bit kernels with largeish memory xen: use default_idle xen: clean up "extra" memory handling some more * 'upstream/bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: x86/32: perform initial startup on initial_page_table xen: don't bother to stop other cpus on shutdown/reboot	2010-12-03 10:08:52 -08:00
John Stultz	08ec0c58fb	x86: Improve TSC calibration using a delayed workqueue Boot to boot the TSC calibration may vary by quite a large amount. While normal variance of 50-100ppm can easily be seen, the quick calibration code only requires 500ppm accuracy, which is the limit of what NTP can correct for. This can cause problems for systems being used as NTP servers, as every time they reboot it can take hours for them to calculate the new drift error caused by the calibration. The classic trade-off here is calibration accuracy vs slow boot times, as during the calibration nothing else can run. This patch uses a delayed workqueue to calibrate the TSC over the period of a second. This allows very accurate calibration (in my tests only varying by 1khz or 0.4ppm boot to boot). Additionally this refined calibration step does not block the boot process, and only delays the TSC clocksoure registration by a few seconds in early boot. If the refined calibration strays 1% from the early boot calibration value, the system will fall back to already calculated early boot calibration. Credit to Andi Kleen who suggested using a timer quite awhile back, but I dismissed it thinking the timer calibration would be done after the clocksource was registered (which would break things). Forgive me for my short-sightedness. This patch has worked very well in my testing, but TSC hardware is quite varied so it would probably be good to get some extended testing, possibly pushing inclusion out to 2.6.39. Signed-off-by: John Stultz <johnstul@us.ibm.com> LKML-Reference: <1289003985-29060-1-git-send-email-johnstul@us.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@elte.hu> CC: Martin Schwidefsky <schwidefsky@de.ibm.com> CC: Clark Williams <williams@redhat.com> CC: Andi Kleen <andi@firstfloor.org>	2010-12-02 16:48:37 -08:00
John Stultz	b0f969009f	Merge remote branch 'tip/x86/tsc' into fortglx/2.6.38/tip/x86/tsc Conflicts: Documentation/kernel-parameters.txt	2010-12-02 16:47:52 -08:00
Jeremy Fitzhardinge	64141da587	vmalloc: eagerly clear ptes on vunmap On stock 2.6.37-rc4, running: # mount lilith:/export /mnt/lilith # find /mnt/lilith/ -type f -print0 \| xargs -0 file crashes the machine fairly quickly under Xen. Often it results in oops messages, but the couple of times I tried just now, it just hung quietly and made Xen print some rude messages: (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 1d7058 (pfn 18fa7) (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp 1000000000000000) for mfn 1d2e04 (pfn 1d1fb) (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04 Which means the domain tried to map a pagetable page RW, which would allow it to map arbitrary memory, so Xen stopped it. This is because vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had finished with them, and those pages got recycled as pagetable pages while still having these RW aliases. Removing those mappings immediately removes the Xen-visible aliases, and so it has no problem with those pages being reused as pagetable pages. Deferring the TLB flush doesn't upset Xen because it can flush the TLB itself as needed to maintain its invariants. When unmapping a region in the vmalloc space, clear the ptes immediately. There's no point in deferring this because there's no amortization benefit. The TLBs are left dirty, and they are flushed lazily to amortize the cost of the IPIs. This specific motivation for this patch is an oops-causing regression since 2.6.36 when using NFS under Xen, triggered by the NFS client's use of vm_map_ram() introduced in `56e4ebf877` ("NFS: readdir with vmapped pages") . XFS also uses vm_map_ram() and could cause similar problems. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-12-02 14:51:15 -08:00
Stefano Stabellini	512b109ec9	xen: unplug the emulated devices at resume time Early after being resumed we need to unplug again the emulated devices. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-12-02 14:40:53 +00:00
Stefano Stabellini	af42b8d12f	xen: fix MSI setup and teardown for PV on HVM guests When remapping MSIs into pirqs for PV on HVM guests, qemu is responsible for doing the actual mapping and unmapping. We only give qemu the desired pirq number when we ask to do the mapping the first time, after that we should be reading back the pirq number from qemu every time we want to re-enable the MSI. This fixes a bug in xen_hvm_setup_msi_irqs that manifests itself when trying to enable the same MSI for the second time: the old MSI to pirq mapping is still valid at this point but xen_hvm_setup_msi_irqs would try to assign a new pirq anyway. A simple way to reproduce this bug is to assign an MSI capable network card to a PV on HVM guest, if the user brings down the corresponding ethernet interface and up again, Linux would fail to enable MSIs on the device. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-12-02 14:34:25 +00:00
Ian Campbell	805e3f4950	xen: x86/32: perform initial startup on initial_page_table Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-29 17:07:53 -08:00
Jeremy Fitzhardinge	31e323cca9	xen: don't bother to stop other cpus on shutdown/reboot Xen will shoot all the VCPUs when we do a shutdown hypercall, so there's no need to do it manually. In any case it will fail because all the IPI irqs have been pulled down by this point, so the cross-CPU calls will simply hang forever. Until change `76fac077db` the function calls were not synchronously waited for, so this wasn't apparent. However after that change the calls became synchronous leading to a hang on shutdown on multi-VCPU guests. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Cc: Alok Kataria <akataria@vmware.com>	2010-11-29 14:16:53 -08:00
Mathias Krause	559ad0ff13	crypto: aesni-intel - Fixed build error on x86-32 Exclude AES-GCM code for x86-32 due to heavy usage of 64-bit registers not available on x86-32. While at it, fixed unregister order in aesni_exit(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-11-29 08:35:39 +08:00
Linus Torvalds	a9e40a2493	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Fix the software context switch counter perf, x86: Fixup Kconfig deps x86, perf, nmi: Disable perf if counters are not accessible perf: Fix inherit vs. context rotation bug	2010-11-28 12:25:02 -08:00
Jeremy Fitzhardinge	e7a3481c02	x86/pvclock: Zero last_value on resume If the guest domain has been suspend/resumed or migrated, then the system clock backing the pvclock clocksource may revert to a smaller value (ie, can be non-monotonic across the migration/save-restore). Make sure we zero last_value in that case so that the domain continues to see clock updates. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-28 09:33:20 +01:00
Mathias Krause	0d258efb6a	crypto: aesni-intel - Ported implementation to x86-32 The AES-NI instructions are also available in legacy mode so the 32-bit architecture may profit from those, too. To illustrate the performance gain here's a short summary of a dm-crypt speed test on a Core i7 M620 running at 2.67GHz comparing both assembler implementations: x86: i568 aes-ni delta ECB, 256 bit: 93.8 MB/s 123.3 MB/s +31.4% CBC, 256 bit: 84.8 MB/s 262.3 MB/s +209.3% LRW, 256 bit: 108.6 MB/s 222.1 MB/s +104.5% XTS, 256 bit: 105.0 MB/s 205.5 MB/s +95.7% Additionally, due to some minor optimizations, the 64-bit version also got a minor performance gain as seen below: x86-64: old impl. new impl. delta ECB, 256 bit: 121.1 MB/s 123.0 MB/s +1.5% CBC, 256 bit: 285.3 MB/s 290.8 MB/s +1.9% LRW, 256 bit: 263.7 MB/s 265.3 MB/s +0.6% XTS, 256 bit: 251.1 MB/s 255.3 MB/s +1.7% Signed-off-by: Mathias Krause <minipli@googlemail.com> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-11-27 16:34:46 +08:00
Linus Torvalds	fbe6c4047f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: dmar, x86: Use function stubs when CONFIG_INTR_REMAP is disabled x86-64: Fix and clean up AMD Fam10 MMCONF enabling x86: UV: Address interrupt/IO port operation conflict x86: Use online node real index in calulate_tbl_offset() x86, asm: Fix binutils 2.15 build failure	2010-11-27 07:28:47 +09:00
Linus Torvalds	d2f30c73ab	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf symbols: Remove incorrect open-coded container_of() perf record: Handle restrictive permissions in /proc/{kallsyms,modules} x86/kprobes: Prevent kprobes to probe on save_args() irq_work: Drop cmpxchg() result perf: Fix owner-list vs exit x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG tracing: Fix recursive user stack trace perf,hw_breakpoint: Initialize hardware api earlier x86: Ignore trap bits on single step exceptions tracing: Force arch_local_irq_* notrace for paravirt tracing: Fix module use of trace_bprintk()	2010-11-27 07:28:17 +09:00
Cyrill Gorcunov	af86da5318	perf, x86: P4 PMU - describe config format Add description of .config in a sake of RAW events. At least this should bring some light to those who will be reading this code. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: Stephane Eranian <eranian@google.com> Cc: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:57 +01:00
Peter Zijlstra	004417a6d4	perf, arch: Cleanup perf-pmu init vs lockup-detector The perf hardware pmu got initialized at various points in the boot, some before early_initcall() some after (notably arch_initcall). The problem is that the NMI lockup detector is ran from early_initcall() and expects the hardware pmu to be present. Sanitize this by moving all architecture hardware pmu implementations to initialize at early_initcall() and move the lockup detector to an explicit initcall right after that. Cc: paulus <paulus@samba.org> Cc: davem <davem@davemloft.net> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Acked-by: Paul Mundt <lethal@linux-sh.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290707759.2145.119.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:56 +01:00
Andi Kleen	5ef428c4b5	x86: Set cpu masks before calling CPU_STARTING notifiers When booting up a CPU set the various topology masks before calling the CPU_STARTING notifier. This way the notifier can actually use the masks. This is needed for a perf change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290077254-12165-2-git-send-email-andi@firstfloor.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:56 +01:00
Franck Bui-Huu	6c7e550f13	perf: Introduce is_sampling_event() and use it when appropriate. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1290525705-6265-1-git-send-email-fbuihuu@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:14:54 +01:00
Ingo Molnar	6c869e772c	Merge branch 'perf/urgent' into perf/core Conflicts: arch/x86/kernel/apic/hw_nmi.c Merge reason: Resolve conflict, queue up dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:07:02 +01:00
Ingo Molnar	e4e91ac410	Merge commit 'v2.6.37-rc3' into perf/core Merge reason: Pick up latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:04:47 +01:00
Peter Zijlstra	cc2067a514	perf, x86: Fixup Kconfig deps This leads to a Kconfig dep inversion, x86 selects PERF_EVENT (due to a hw_breakpoint dep) but doesn't unconditionally provide HAVE_PERF_EVENT. (This can cause build failures on M386/M486 kernel .config's.) Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20101117222055.982965150@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:00:58 +01:00
Don Zickus	33c6d6a7ad	x86, perf, nmi: Disable perf if counters are not accessible In a kvm virt guests, the perf counters are not emulated. Instead they return zero on a rdmsrl. The perf nmi handler uses the fact that crossing a zero means the counter overflowed (for those counters that do not have specific interrupt bits). Therefore on kvm guests, perf will swallow all NMIs thinking the counters overflowed. This causes problems for subsystems like kgdb which needs NMIs to do its magic. This problem was discovered by running kgdb tests. The solution is to write garbage into a perf counter during the initialization and hopefully reading back the same number. On kvm guests, the value will be read back as zero and we disable perf as a result. Reported-by: Jason Wessel <jason.wessel@windriver.com> Patch-inspired-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <1290462923-30734-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-26 15:00:57 +01:00
Linus Torvalds	8a3fbc9fdb	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: remove duplicated #include xen: x86/32: perform initial startup on initial_page_table	2010-11-25 08:35:53 +09:00
Andrew Morton	91d95fda85	arch/x86/include/asm/fixmap.h: mark __set_fixmap_offset as __always_inline When compiling arch/x86/kernel/early_printk_mrst.c with i386 allmodconfig, gcc-4.1.0 generates an out-of-line copy of __set_fixmap_offset() which contains a reference to __this_fixmap_does_not_exist which the compiler cannot elide. Marking __set_fixmap_offset() as __always_inline prevents this. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Feng Tang <feng.tang@intel.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-25 06:50:49 +09:00
Huang Weiyi	e6d4a76dbf	xen: remove duplicated #include Remove duplicated #include('s) in arch/x86/xen/setup.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-11-24 12:07:45 -05:00
Ian Campbell	5b5c1af104	xen: x86/32: perform initial startup on initial_page_table Only make swapper_pg_dir readonly and pinned when generic x86 architecture code (which also starts on initial_page_table) switches to it. This helps ensure that the generic setup paths work on Xen unmodified. In particular clone_pgd_range writes directly to the destination pgd entries and is used to initialise swapper_pg_dir so we need to ensure that it remains writeable until the last possible moment during bring up. This is complicated slightly by the need to avoid sharing kernel PMD entries when running under Xen, therefore the Xen implementation must make a copy of the kernel PMD (which is otherwise referred to by both intial_page_table and swapper_pg_dir) before switching to swapper_pg_dir. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-11-24 12:07:44 -05:00
Linus Torvalds	a4ec046c98	Merge branch 'upstream/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (23 commits) xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs. xen: set IO permission early (before early_cpu_init()) xen: re-enable boot-time ballooning xen/balloon: make sure we only include remaining extra ram xen/balloon: the balloon_lock is useless xen: add extra pages to balloon xen: make evtchn's name less generic xen/evtchn: the evtchn device is non-seekable Revert "xen/privcmd: create address space to allow writable mmaps" xen/events: use locked set\|clear_bit() for cpu_evtchn_mask xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore xen/xenfs: update xenfs_mount for new prototype xen: fix header export to userspace xen: implement XENMEM_machphys_mapping xen: set vma flag VM_PFNMAP in the privcmd mmap file_op xen: xenfs: privcmd: check put_user() return code xen/evtchn: add missing static xen/evtchn: Fix name of Xen event-channel device xen/evtchn: don't do unbind_from_irqhandler under spinlock xen/evtchn: remove spurious barrier ...	2010-11-24 08:23:18 +09:00
Jeremy Fitzhardinge	bc15fde77f	xen: use default_idle We just need the idle loop to drop into safe_halt, which default_idle() is perfectly capable of doing. There's no need to duplicate it. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-22 17:19:34 -08:00
Jeremy Fitzhardinge	c2d0879112	xen: clean up "extra" memory handling some more Make sure that extra_pages is added for all E820_RAM regions beyond mem_end - completely excluded regions as well as the remains of partially included regions. Also makes sure the extra region is not unnecessarily high, and simplifies the logic to decide which regions should be added. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-22 16:34:28 -08:00
Jeremy Fitzhardinge	9b8321531a	Merge branches 'upstream/core', 'upstream/xenfs' and 'upstream/evtchn' into upstream/for-linus * upstream/core: xen/events: Use PIRQ instead of GSI value when unmapping MSI/MSI-X irqs. xen: set IO permission early (before early_cpu_init()) xen: re-enable boot-time ballooning xen/balloon: make sure we only include remaining extra ram xen/balloon: the balloon_lock is useless xen: add extra pages to balloon xen/events: use locked set\|clear_bit() for cpu_evtchn_mask xen/evtchn: clear secondary CPUs' cpu_evtchn_mask[] after restore xen: implement XENMEM_machphys_mapping * upstream/xenfs: Revert "xen/privcmd: create address space to allow writable mmaps" xen/xenfs: update xenfs_mount for new prototype xen: fix header export to userspace xen: set vma flag VM_PFNMAP in the privcmd mmap file_op xen: xenfs: privcmd: check put_user() return code * upstream/evtchn: xen: make evtchn's name less generic xen/evtchn: the evtchn device is non-seekable xen/evtchn: add missing static xen/evtchn: Fix name of Xen event-channel device xen/evtchn: don't do unbind_from_irqhandler under spinlock xen/evtchn: remove spurious barrier xen/evtchn: ports start enabled xen/evtchn: dynamically allocate port_user array xen/evtchn: track enabled state for each port	2010-11-22 12:22:42 -08:00
Konrad Rzeszutek Wilk	ec35a69c46	xen: set IO permission early (before early_cpu_init()) This patch is based off "xen dom0: Set up basic IO permissions for dom0." by Juan Quintela <quintela@redhat.com>. On AMD machines when we boot the kernel as Domain 0 we get this nasty: mapping kernel into physical memory Xen: setup ISA identity maps about to get started... (XEN) traps.c:475:d0 Unhandled general protection fault fault/trap [#13] on VCPU 0 [ec=0000] (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-4.1-101116 x86_64 debug=y Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: e033:[<ffffffff8130271b>] (XEN) RFLAGS: 0000000000000282 EM: 1 CONTEXT: pv guest (XEN) rax: 000000008000c068 rbx: ffffffff8186c680 rcx: 0000000000000068 (XEN) rdx: 0000000000000cf8 rsi: 000000000000c000 rdi: 0000000000000000 (XEN) rbp: ffffffff81801e98 rsp: ffffffff81801e50 r8: ffffffff81801eac (XEN) r9: ffffffff81801ea8 r10: ffffffff81801eb4 r11: 00000000ffffffff (XEN) r12: ffffffff8186c694 r13: ffffffff81801f90 r14: ffffffffffffffff (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0 (XEN) cr3: 0000000221803000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e02b cs: e033 (XEN) Guest stack trace from rsp=ffffffff81801e50: RIP points to read_pci_config() function. The issue is that we don't set IO permissions for the Linux kernel early enough. The call sequence used to be: xen_start_kernel() x86_init.oem.arch_setup = xen_setup_arch; setup_arch: - early_cpu_init - early_init_amd - read_pci_config - x86_init.oem.arch_setup [ xen_arch_setup ] - set IO permissions. We need to set the IO permissions earlier on, which this patch does. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-11-22 12:10:31 -08:00
Lin Ming	691513f70d	x86: Resume trampoline must be executable commit 5bd5a452(x86: Add NX protection for kernel data) marked the trampoline area NX - which unsurprisingly breaks resume and cpu hotplug. Revert the portion of that commit, which touches the trampoline. Originally-from: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com> Cc: Matthieu Castet <castet.matthieu@free.fr> Cc: Siarhei Liakh <sliakh.lkml@gmail.com> Cc: Xuxian Jiang <jiang@cs.ncsu.edu> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Tested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-22 14:38:52 +01:00
Thomas Gleixner	9cdca86972	x86: platform: Move iris to x86/platform where it belongs Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-20 10:37:05 +01:00
Jeremy Fitzhardinge	d2a817130c	xen: re-enable boot-time ballooning Now that the balloon driver doesn't stumble over non-RAM pages, we can enable the extra space for ballooning. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-19 23:28:08 -08:00
Vasiliy Kulikov	5ca9afdb9f	x86, mrst: Check platform_device_register() return code platform_device_register() may fail, if so propagate the return code from mrst_device_create(). Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> LKML-Reference: <1290104207-31279-1-git-send-email-segoon@openwall.com> Acked-by: Alan Cox <alan@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-18 13:45:46 -08:00
Ingo Molnar	ae51ce9061	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-11-18 20:07:12 +01:00
Linus Torvalds	fb3ff69d13	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: VMX: Fix host userspace gsbase corruption KVM: Correct ordering of ldt reload wrt fs/gs reload	2010-11-18 09:45:47 -08:00
Linus Torvalds	2d42dc3feb	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb,ppc: Fix regression in evr register handling kgdb,x86: fix regression in detach handling kdb: fix crash when KDB_BASE_CMD_MAX is exceeded kdb: fix memory leak in kdb_main.c	2010-11-18 08:24:58 -08:00
Hans Rosenfeld	f658bcfb26	x86, cacheinfo: Cleanup L3 cache index disable support Adaptions to the changes of the AMD northbridge caching code: instead of a bool in each l3 struct, use a flag in amd_northbridges.flags to indicate L3 cache index disable support; use a pointer to the whole northbridge instead of the misc device in the l3 struct; simplify the initialisation; dynamically generate sysfs attribute array. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-18 15:53:06 +01:00
Hans Rosenfeld	9653a5c76c	x86, amd-nb: Cleanup AMD northbridge caching code Support more than just the "Misc Control" part of the northbridges. Support more flags by turning "gart_supported" into a single bit flag that is stored in a flags member. Clean up related code by using a set of functions (amd_nb_num(), amd_nb_has_feature() and node_to_amd_nb()) instead of accessing the NB data structures directly. Reorder the initialization code and put the GART flush words caching in a separate function. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-18 15:53:05 +01:00
Hans Rosenfeld	eec1d4fa00	x86, amd-nb: Complete the rename of AMD NB and related code Not only the naming of the files was confusing, it was even more so for the function and variable names. Renamed the K8 NB and NUMA stuff that is also used on other AMD platforms. This also renames the CONFIG_K8_NUMA option to CONFIG_AMD_NUMA and the related file k8topology_64.c to amdtopology_64.c. No functional changes intended. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-18 15:53:04 +01:00
Soeren Sandmann Pedersen	9c0729dc80	x86: Eliminate bp argument from the stack tracing routines The various stack tracing routines take a 'bp' argument in which the caller is supposed to provide the base pointer to use, or 0 if doesn't have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not defined, this means all callers in principle should either always pass 0, or be conditional on CONFIG_FRAME_POINTER. However, there are only really three use cases for stack tracing: (a) Trace the current task, including IRQ stack if any (b) Trace the current task, but skip IRQ stack (c) Trace some other task In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just be 0. If it _is_ defined, then - in case (a) bp should be gotten directly from the CPU's register, so the caller should pass NULL for regs, - in case (b) the caller should should pass the IRQ registers to dump_trace(), - in case (c) bp should be gotten from the top of the task's stack, so the caller should pass NULL for regs. Hence, the bp argument is not necessary because the combination of task and regs is sufficient to determine an appropriate value for bp. This patch introduces a new inline function stack_frame(task, regs) that computes the desired bp. This function is then called from the two versions of dump_stack(). Signed-off-by: Soren Sandmann <ssp@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org>, Cc: Frederic Weisbecker <fweisbec@gmail.com>, Cc: Arnaldo Carvalho de Melo <acme@redhat.com>, LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-11-18 14:37:34 +01:00
Jan Beulich	37db6c8f1d	x86-64: Fix and clean up AMD Fam10 MMCONF enabling Candidate memory ranges were not calculated properly (start addresses got needlessly rounded down, and end addresses didn't get rounded up at all), address comparison for secondary CPUs was done on only part of the address, and disabled status wasn't tracked properly. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <4CE24DF40200007800022737@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:41:35 +01:00
Masami Hiramatsu	de31ec8a31	x86/kprobes: Prevent kprobes to probe on save_args() Prevent kprobes to probe on save_args() since this function will be called from breakpoint exception handler. That will cause infinit loop on breakpoint handling. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> LKML-Reference: <20101118101655.2779.2816.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:40:19 +01:00
matthieu castet	84e1c6bb38	x86: Add RO/NX protection for loadable kernel modules This patch is a logical extension of the protection provided by CONFIG_DEBUG_RODATA to LKMs. The protection is provided by splitting module_core and module_init into three logical parts each and setting appropriate page access permissions for each individual section: 1. Code: RO+X 2. RO data: RO+NX 3. RW data: RW+NX In order to achieve proper protection, layout_sections() have been modified to align each of the three parts mentioned above onto page boundary. Next, the corresponding page access permissions are set right before successful exit from load_module(). Further, free_module() and sys_init_module have been modified to set module_core and module_init as RW+NX right before calling module_free(). By default, the original section layout and access flags are preserved. When compiled with CONFIG_DEBUG_SET_MODULE_RONX=y, the patch will page-align each group of sections to ensure that each page contains only one type of content and will enforce RO/NX for each group of pages. -v1: Initial proof-of-concept patch. -v2: The patch have been re-written to reduce the number of #ifdefs and to make it architecture-agnostic. Code formatting has also been corrected. -v3: Opportunistic RO/NX protection is now unconditional. Section page-alignment is enabled when CONFIG_DEBUG_RODATA=y. -v4: Removed most macros and improved coding style. -v5: Changed page-alignment and RO/NX section size calculation -v6: Fixed comments. Restricted RO/NX enforcement to x86 only -v7: Introduced CONFIG_DEBUG_SET_MODULE_RONX, added calls to set_all_modules_text_rw() and set_all_modules_text_ro() in ftrace -v8: updated for compatibility with linux 2.6.33-rc5 -v9: coding style fixes -v10: more coding style fixes -v11: minor adjustments for -tip -v12: minor adjustments for v2.6.35-rc2-tip -v13: minor adjustments for v2.6.37-rc1-tip Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F914.9070106@free.fr> [ minor cleanliness edits, -v14: build failure fix ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 13:32:56 +01:00
Matthieu Castet	5bd5a45266	x86: Add NX protection for kernel data This patch expands functionality of CONFIG_DEBUG_RODATA to set main (static) kernel data area as NX. The following steps are taken to achieve this: 1. Linker script is adjusted so .text always starts and ends on a page bound 2. Linker script is adjusted so .rodata always start and end on a page boundary 3. NX is set for all pages from _etext through _end in mark_rodata_ro. 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c 5. bios rom is set to x when pcibios is used. The results of patch application may be observed in the diff of kernel page table dumps: pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc00a0000 640K RW GLB NX pte +0xc00a0000-0xc0100000 384K RW GLB x pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte No pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc0100000 1M RW GLB NX pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte The patch has been originally developed for Linux 2.6.34-rc2 x86 by Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>. -v1: initial patch for 2.6.30 -v2: patch for 2.6.31-rc7 -v3: moved all code into arch/x86, adjusted credits -v4: fixed ifdef, removed credits from CREDITS -v5: fixed an address calculation bug in mark_nxdata_nx() -v6: added acked-by and PT dump diff to commit log -v7: minor adjustments for -tip -v8: rework with the merge of "Set first MB as RW+NX" Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F82E.60601@free.fr> [ minor cleanliness edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 12:52:04 +01:00
matthieu castet	64edc8ed5f	x86: Fix improper large page preservation This patch fixes a bug in try_preserve_large_page() which may result in improper large page preservation and improper application of page attributes to the memory area outside of the original change request. More specifically, the problem manifests itself when set_memory_*() is called for several pages at the beginning of the large page and try_preserve_large_page() erroneously concludes that the change can be applied to whole large page. The fix consists of 3 parts: 1. Addition of "required" protection attributes in static_protections(), so .data and .bss can be guaranteed to stay "RW" 2. static_protections() is now called for every small page within large page to determine compatibility of new protection attributes (instead of just small pages within the requested range). 3. Large page can be preserved only if attribute change is large-page-aligned and covers whole large page. -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2 -v2: Replaced pfn check with address check for kernel rw-data Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F7F3.8030809@free.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 12:52:04 +01:00
Dimitri Sivanich	8191c9f692	x86: UV: Address interrupt/IO port operation conflict This patch for SGI UV systems addresses a problem whereby interrupt transactions being looped back from a local IOH, through the hub to a local CPU can (erroneously) conflict with IO port operations and other transactions. To workaound this we set a high bit in the APIC IDs used for interrupts. This bit appears to be ignored by the sockets, but it avoids the conflict in the hub. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> LKML-Reference: <20101116222352.GA8155@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> ___ arch/x86/include/asm/uv/uv_hub.h \| 4 ++++ arch/x86/include/asm/uv/uv_mmrs.h \| 19 ++++++++++++++++++- arch/x86/kernel/apic/x2apic_uv_x.c \| 25 +++++++++++++++++++++++-- arch/x86/platform/uv/tlb_uv.c \| 2 +- arch/x86/platform/uv/uv_time.c \| 4 +++- 5 files changed, 49 insertions(+), 5 deletions(-)	2010-11-18 10:41:25 +01:00
Ingo Molnar	fcf48a725a	Merge branch 'perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent	2010-11-18 10:37:51 +01:00
Yinghai Lu	9223081f54	x86: Use online node real index in calulate_tbl_offset() Found a NUMA system that doesn't have RAM installed at the first socket which hangs while executing init scripts. bisected it to: \| commit `9329672021` \| Author: Shaohua Li <shaohua.li@intel.com> \| Date: Wed Oct 20 11:07:03 2010 +0800 \| \| x86: Spread tlb flush vector between nodes It turns out when first socket is not online it could have cpus on node1 tlb_offset set to bigger than NUM_INVALIDATE_TLB_VECTORS. That could affect systems like 4 sockets, but socket 2 doesn't have installed, sockets 3 will get too big tlb_offset. Need to use real online node idx. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Shaohua Li <shaohua.li@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CDEDE59.40603@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 10:10:50 +01:00
Shérab	82148d1d0b	x86/platform: Add Eurobraille/Iris power off support The Iris machines from Eurobraille do not have APM or ACPI support to shut themselves down properly. A special I/O sequence is needed to do so. This modle runs this I/O sequence at kernel shutdown when its force parameter is set to 1. Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> [ did minor coding style edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 10:03:24 +01:00
Kees Cook	79250af2d5	x86: Fix included-by file reference comments Adjust the paths for files that are including verify_cpu.S. Reported-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Kees Cook <kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> LKML-Reference: <1289931004-16066-1-git-send-email-kees.cook@canonical.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:58:54 +01:00
Tetsuo Handa	96e612ffc3	x86, asm: Fix binutils 2.15 build failure Add parentheses around one pushl_cfi argument. Commit `df5d1874` "x86: Use {push,pop}{l,q}_cfi in more places" caused GNU assembler 2.15 (Debian Sarge) to fail. It is still failing as of commit `07bd8516` "x86, asm: Restore parentheses around one pushl_cfi argument". This patch solves build failure with GNU assembler 2.15. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Jan Beulich <jbeulich@novell.com> Cc: heukelum@fastmail.fm Cc: hpa@linux.intel.com LKML-Reference: <201011160445.oAG4jGif079860@www262.sakura.ne.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:25:11 +01:00
Rakib Mullick	0e2af2a9ab	x86, hw_nmi: Move backtrace_mask declaration under ARCH_HAS_NMI_WATCHDOG backtrace_mask has been used under the code context of ARCH_HAS_NMI_WATCHDOG. So put it into that context. We were warned by the following warning: arch/x86/kernel/apic/hw_nmi.c:21: warning: ‘backtrace_mask’ defined but not used Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Don Zickus <dzickus@redhat.com> LKML-Reference: <1289573455-3410-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:15:12 +01:00
Don Zickus	072b198a4a	x86, nmi_watchdog: Remove all stub function calls from old nmi_watchdog Now that the bulk of the old nmi_watchdog is gone, remove all the stub variables and hooks associated with it. This touches lots of files mainly because of how the io_apic nmi_watchdog was implemented. Now that the io_apic nmi_watchdog is forever gone, remove all its fingers. Most of this code was not being exercised by virtue of nmi_watchdog != NMI_IO_APIC, so there shouldn't be anything to risky here. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-3-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:08:23 +01:00
Don Zickus	5f2b0ba4d9	x86, nmi_watchdog: Remove the old nmi_watchdog Now that we have a new nmi_watchdog that is more generic and sits on top of the perf subsystem, we really do not need the old nmi_watchdog any more. In addition, the old nmi_watchdog doesn't really work if you are using the default clocksource, hpet. The old nmi_watchdog code relied on local apic interrupts to determine if the cpu is still alive. With hpet as the clocksource, these interrupts don't increment any more and the old nmi_watchdog triggers false postives. This piece removes the old nmi_watchdog code and stubs out any variables and functions calls. The stubs are the same ones used by the new nmi_watchdog code, so it should be well tested. Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: gorcunov@openvz.org LKML-Reference: <1289578944-28564-2-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 09:08:23 +01:00
Ingo Molnar	a89d4bd055	Merge branch 'tip/perf/urgent-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent	2010-11-18 08:07:36 +01:00
Avi Kivity	c8770e7ba6	KVM: VMX: Fix host userspace gsbase corruption We now use load_gs_index() to load gs safely; unfortunately this also changes MSR_KERNEL_GS_BASE, which we managed separately. This resulted in confusion and breakage running 32-bit host userspace on a 64-bit kernel. Fix by - saving guest MSR_KERNEL_GS_BASE before we we reload the host's gs - doing the host save/load unconditionally, instead of only when in guest long mode Things can be cleaned up further, but this is the minmal fix for now. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:48:05 -02:00
Avi Kivity	0a77fe4c18	KVM: Correct ordering of ldt reload wrt fs/gs reload If fs or gs refer to the ldt, they must be reloaded after the ldt. Reorder the code to that effect. Userspace code that uses the ldt with kvm is nonexistent, so this doesn't fix a user-visible bug. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-17 19:47:59 -02:00
Jason Wessel	10a6e67648	kgdb,x86: fix regression in detach handling The fix from `ba773f7c51` (x86,kgdb: Fix hw breakpoint regression) was not entirely complete. The kgdb_remove_all_hw_break() function also needs to call the hw_break_release_slot() or else a breakpoint can get activated again after the debugger has detached. The kgdb test suite exposes the behavior in the form of either a hang or repetitive failure. The kernel config that exposes the problem contains all of the following: CONFIG_DEBUG_RODATA=y CONFIG_KGDB_TESTS=y CONFIG_KGDB_TESTS_ON_BOOT=y CONFIG_KGDB_TESTS_BOOT_STRING="V1F100" Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Tested-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-11-17 13:54:57 -06:00
Arnd Bergmann	451a3c24b0	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-11-17 08:59:32 -08:00
Randy Dunlap	ad02519a0d	x86, mrst: Fix dependencies of "select INTEL_SCU_IPC" commit `b9fc71f47` (x86, mrst: The shutdown for MRST requires the SCU IPC mechanism) introduced the following warning: warning: (X86_MRST && PCI && PCI_GOANY && X86_32 && X86_EXTENDED_PLATFORM && X86_IO_APIC) selects INTEL_SCU_IPC which has unmet direct dependencies (X86 && X86_PLATFORM_DEVICES && X86_MRST) which is due to the hierarchical menu structure. Select X86_PLATFORM_DEVICES as well. Originally-from: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <20101115101406.77e072ef.randy.dunlap@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>	2010-11-17 13:41:44 +01:00
Alan Cox	b9fc71f47d	x86, mrst: The shutdown for MRST requires the SCU IPC mechanism Fix the build failure reported by Randy. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101115173110.6877.83958.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-17 13:27:44 +01:00
Jeremy Fitzhardinge	20b4755e4f	Merge commit 'v2.6.37-rc2' into upstream/xenfs * commit 'v2.6.37-rc2': (10093 commits) Linux 2.6.37-rc2 capabilities/syslog: open code cap_syslog logic to fix build failure i2c: Sanity checks on adapter registration i2c: Mark i2c_adapter.id as deprecated i2c: Drivers shouldn't include <linux/i2c-id.h> i2c: Delete unused adapter IDs i2c: Remove obsolete cleanup for clientdata include/linux/kernel.h: Move logging bits to include/linux/printk.h Fix gcc 4.5.1 miscompiling drivers/char/i8k.c (again) hwmon: (w83795) Check for BEEP pin availability hwmon: (w83795) Clear intrusion alarm immediately hwmon: (w83795) Read the intrusion state properly hwmon: (w83795) Print the actual temperature channels as sources hwmon: (w83795) List all usable temperature sources hwmon: (w83795) Expose fan control method hwmon: (w83795) Fix fan control mode attributes hwmon: (lm95241) Check validity of input values hwmon: Change mail address of Hans J. Koch PCI: sysfs: fix printk warnings GFS2: Fix inode deallocation race ...	2010-11-16 11:06:22 -08:00
Linus Torvalds	e5c13537b0	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: sysfs: fix printk warnings PCI: fix pci_bus_alloc_resource() hang, prefer positive decode PCI: read current power state at enable time PCI: fix size checks for mmap() on /proc/bus/pci files x86/PCI: coalesce overlapping host bridge windows PCI hotplug: ibmphp: Add check to prevent reading beyond mapped area	2010-11-15 14:01:33 -08:00
Tadeusz Struk	0bd82f5f63	crypto: aesni-intel - RFC4106 AES-GCM Driver Using Intel New Instructions This patch adds an optimized RFC4106 AES-GCM implementation for 64-bit kernels. It supports 128-bit AES key size. This leverages the crypto AEAD interface type to facilitate a combined AES & GCM operation to be implemented in assembly code. The assembly code leverages Intel(R) AES New Instructions and the PCLMULQDQ instruction. Signed-off-by: Adrian Hoban <adrian.hoban@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Gabriele Paoloni <gabriele.paoloni@intel.com> Signed-off-by: Aidan O'Mahony <aidan.o.mahony@intel.com> Signed-off-by: Erdinc Ozturk <erdinc.ozturk@intel.com> Signed-off-by: James Guilford <james.guilford@intel.com> Signed-off-by: Wajdi Feghali <wajdi.k.feghali@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2010-11-13 21:47:55 +09:00
Linus Torvalds	891cbd30ef	Merge branch 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen: do not release any memory under 1M in domain 0 xen: events: do not unmask event channels on resume xen: correct size of level2_kernel_pgt	2010-11-12 16:01:55 -08:00
Linus Torvalds	b5c5510436	Merge branch 'stable/xen-pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/xen-pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: MAINTAINERS: Mark XEN lists as moderated xen-pcifront: fix PCI reference leak xen-pcifront: Remove duplicate inclusion of headers. xen: fix memory leak in Xen PCI MSI/MSI-X allocator. MAINTAINERS: Update mailing list name for Xen pieces.	2010-11-12 15:54:39 -08:00
Ian Campbell	7e77506a59	xen: implement XENMEM_machphys_mapping This hypercall allows Xen to specify a non-default location for the machine to physical mapping. This capability is used when running a 32 bit domain 0 on a 64 bit hypervisor to shrink the hypervisor hole to exactly the size required. [ Impact: add Xen hypercall definitions ] Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-11-12 15:00:06 -08:00
Linus Torvalds	25a34554d6	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pvclock: Remove leftover scale_delta() function x86, apic: Remove double #include x86: Adjust section annotations in AMD Fam10 MMCONF enabling code x86, UV: Update node controller MMRs x86: Remove unnecessary casts of void ptr returning alloc function return values x86: Address gcc4.6 "set but not used" warnings in apic.h x86, mm: Fix section mismatch in tlb.c	2010-11-12 08:40:23 -08:00
Frederic Weisbecker	6c0aca288e	x86: Ignore trap bits on single step exceptions When a single step exception fires, the trap bits, used to signal hardware breakpoints, are in a random state. These trap bits might be set if another exception will follow, like a breakpoint in the next instruction, or a watchpoint in the previous one. Or there can be any junk there. So if we handle these trap bits during the single step exception, we are going to handle an exception twice, or we are going to handle junk. Just ignore them in this case. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=21332 Reported-by: Michael Stefaniuc <mstefani@redhat.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Maciej Rutecki <maciej.rutecki@gmail.com> Cc: Alexandre Julliard <julliard@winehq.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: All since 2.6.33.x <stable@kernel.org>	2010-11-12 14:51:01 +01:00
Dirk Brandewie	37bc9f5078	x86: Ce4100: Add reboot_fixup() for CE4100 This patch adds the CE4100 reboot fixup to reboot_fixups_32.c [ tglx: Moved PCI id to reboot_fixups_32.c ] Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> LKML-Reference: <5bdcfb4f0206fa721570504e95659a03b815bc5e.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-12 00:45:41 +01:00
Dirk Brandewie	91d8037f56	ce4100: Add PCI register emulation for CE4100 This patch provides access methods for PCI registers that mis-behave on the CE4100. Each register can be assigned a private init, read and write routine. The exception to this is the bridge device. The bridge device is the only device on bus zero (0) that requires any fixup so it is a special case. [ tglx: minor coding style cleanups, __init annotation and simplification of ce4100_conf_read/write ] Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com> LKML-Reference: <40b6751381c2275dc359db5a17989cce22ad8db7.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-12 00:45:41 +01:00
Thomas Gleixner	c751e17b53	x86: Add CE4100 platform support Add CE4100 platform support. CE4100 needs early setup like moorestown. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dirk Brandewie <dirk.brandewie@gmail.com> LKML-Reference: <94720fd7f5564a12ebf202cf2c4f4c0d619aab35.1289331834.git.dirk.brandewie@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-12 00:45:41 +01:00
Stefano Stabellini	e060e7af98	xen: set vma flag VM_PFNMAP in the privcmd mmap file_op Set VM_PFNMAP in the privcmd mmap file_op, rather than later in xen_remap_domain_mfn_range when it is too late because vma_wants_writenotify has already been called and vm_page_prot has already been modified. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-11 12:37:43 -08:00
Bjorn Helgaas	4723d0f2f9	x86/PCI: coalesce overlapping host bridge windows Some BIOSes provide PCI host bridge windows that overlap, e.g., pci_root PNP0A03:00: host bridge window [mem 0xb0000000-0xffffffff] pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xdfffffff] pci_root PNP0A03:00: host bridge window [mem 0xf0000000-0xffffffff] If we simply insert these as children of iomem_resource, the second window fails because it conflicts with the first, and the third is inserted as a child of the first, i.e., b0000000-ffffffff PCI Bus 0000:00 f0000000-ffffffff PCI Bus 0000:00 When we claim PCI device resources, this can cause collisions like this if we put them in the first window: pci 0000:00:01.0: address space collision: [mem 0xff300000-0xff4fffff] conflicts with PCI Bus 0000:00 [mem 0xf0000000-0xffffffff] Host bridge windows are top-level resources by definition, so it doesn't make sense to make the third window a child of the first. This patch coalesces any host bridge windows that overlap. For the example above, the result is this single window: pci_root PNP0A03:00: host bridge window [mem 0xafffffff-0xffffffff] This fixes a 2.6.34 regression. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=17011 Reported-and-tested-by: Anisse Astier <anisse@astier.eu> Reported-and-tested-by: Pramod Dematagoda <pmd.lotr.gandalf@gmail.com> Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-11-11 09:34:31 -08:00
Feng Tang	6f207e9bb4	x86: mrst: Set vRTC's IRQ to level trigger type When setting up the mpc_intsrc structure for vRTC's IRQ, we need to set its irqflag to level trigger, otherwise it will be taken as edge triggered and the vRTC IRQ will fire only once, as there is never a EOI issued from the IA core for it. The original code worked in previous kernel. This is because it was configured to level trigger type by luck. It fell into the default PCI trigger category which is level triggered. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101111155019.12924.569.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 17:43:18 +01:00
Vinod Koul	86071535f8	x86: mrst: Add audio driver bindings This patch adds the sound card bindings for Moorestown (pmic_audio) and the Medfield platform (msic_audio) as IPC devices. This ensures they will be created at the right time. Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101110174044.11340.78008.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:28 +01:00
Feng Tang	0146f26145	rtc: Add drivers/rtc/rtc-mrst.c Provide the standard kernel rtc driver interface on top of the vrtc layer added in the previous patch. Signed-off-by: Feng Tang <feng.tang@intel.com> LKML-Reference: <20101110172911.3311.20593.stgit@localhost.localdomain> [Fixed swapped arguments on IPC] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> [Cleaned up and the device creation moved to arch/x86/platform] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:27 +01:00
Feng Tang	7309282c90	x86: mrst: Add vrtc driver which serves as a wall clock device Moorestown platform doesn't have a m146818 RTC device like traditional x86 PC, but a firmware emulated virtual RTC device(vrtc), which provides some basic RTC functions like get/set time. vrtc serves as the only wall clock device on Moorestown platform. [ tglx: Changed the exports to _GPL ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101110172837.3311.40483.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:27 +01:00
Alek Du	cfb505a7eb	x86: mrst: Add Moorestown specific reboot/shutdown support Moorestowns needs to use a special IPC command to reboot or shutdown the platform. Signed-off-by: Alek Du <alek.du@intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101110164928.6365.94243.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-11 11:34:27 +01:00
Steven Rostedt	b590854853	tracing: Force arch_local_irq_* notrace for paravirt When running ktest.pl randconfig tests, I would sometimes trigger a lockdep annotation bug (possible reason: unannotated irqs-on). This triggering happened right after function tracer self test was executed. After doing a config bisect I found that this was caused with having function tracer, paravirt guest, prove locking, and rcu torture all enabled. The rcu torture just enhanced the likelyhood of triggering the bug. Prove locking was needed, since it was the thing that was bugging. Function tracer would trace and disable interrupts in all sorts of funny places. paravirt guest would turn arch_local_irq_* into functions that would be traced. Besides the fact that tracing arch_local_irq_* is just a bad idea, this is what is happening. The bug happened simply in the local_irq_restore() code: if (raw_irqs_disabled_flags(flags)) { \ raw_local_irq_restore(flags); \ trace_hardirqs_off(); \ } else { \ trace_hardirqs_on(); \ raw_local_irq_restore(flags); \ } \ The raw_local_irq_restore() was defined as arch_local_irq_restore(). Now imagine, we are about to enable interrupts. We go into the else case and call trace_hardirqs_on() which tells lockdep that we are enabling interrupts, so it sets the current->hardirqs_enabled = 1. Then we call raw_local_irq_restore() which calls arch_local_irq_restore() which gets traced! Now in the function tracer we disable interrupts with local_irq_save(). This is fine, but flags is stored that we have interrupts disabled. When the function tracer calls local_irq_restore() it does it, but this time with flags set as disabled, so we go into the if () path. This keeps interrupts disabled and calls trace_hardirqs_off() which sets current->hardirqs_enabled = 0. When the tracer is finished and proceeds with the original code, we enable interrupts but leave current->hardirqs_enabled as 0. Which now breaks lockdeps internal processing. Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-11-10 22:29:49 -05:00
Ian Campbell	9ec23a7f6d	xen: do not release any memory under 1M in domain 0 We already deliberately setup a 1-1 P2M for the region up to 1M in order to allow code which assumes this region is already mapped to work without having to convert everything to ioremap. Domain 0 should not return any apparently unused memory regions (reserved or otherwise) in this region to Xen since the e820 may not accurately reflect what the BIOS has stashed in this region. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-11-10 17:19:25 -08:00
Kees Cook	6036f373ea	x86, cpu: Only CPU features determine NX capabilities Fix the NX feature boot warning when NX is missing to correctly reflect that BIOSes cannot disable NX now. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-5-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:43:15 -08:00
Kees Cook	ebba638ae7	x86, cpu: Call verify_cpu during 32bit CPU startup The XD_DISABLE-clearing side-effect needs to happen for both 32bit and 64bit, but the 32bit init routines were not calling verify_cpu() yet. This adds that call to gain the side-effect. The longmode/SSE tests being performed in verify_cpu() need to happen very early for 64bit but not for 32bit. Instead of including it in two places for 32bit, we can just include it once in arch/x86/kernel/head_32.S. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-4-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:43:09 -08:00
Kees Cook	ae84739c27	x86, cpu: Clear XD_DISABLED flag on Intel to regain NX Intel CPUs have an additional MSR bit to indicate if the BIOS was configured to disable the NX cpu feature. This bit was traditionally used for operating systems that did not understand how to handle the NX bit. Since Linux understands this, this BIOS flag should be ignored by default. In a review[1] of reported hardware being used by Ubuntu bug reporters, almost 10% of systems had an incorrectly configured BIOS, leaving their systems unable to use the NX features of their CPU. This change will clear the MSR_IA32_MISC_ENABLE_XD_DISABLE bit so that NX cannot be inappropriately controlled by the BIOS on Intel CPUs. If, under very strange hardware configurations, NX actually needs to be disabled, "noexec=off" can be used to restore the prior behavior. [1] http://www.outflux.net/blog/archives/2010/02/18/data-mining-for-nx-bit/ Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-3-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:42:54 -08:00
Kees Cook	c5cbac6942	x86, cpu: Rename verify_cpu_64.S to verify_cpu.S The code is 32bit already, and can be used in 32bit routines. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-2-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:42:42 -08:00
Peter Zijlstra	034c6efa46	perf, amd: Use kmalloc_node(,__GFP_ZERO) for northbridge structure allocation Jasper suggested we use the zeroing capability of the allocators instead of calling memset ourselves. Add node affinity while we're at it. Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 22:58:40 +01:00
Borislav Petkov	c7657ac0c3	x86, microcode, AMD: Cleanup code a bit get_ucode_data is a memcpy() wrapper which always returns 0. Move it into the header and make it an inline. Remove all code checking its return value and turn it into a void. There should be no functionality change resulting from this patch. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-10 14:54:54 +01:00
Jesper Juhl	1ea6be212e	x86, microcode, AMD: Replace vmalloc+memset with vzalloc We don't have to do memset() ourselves after vmalloc() when we have vzalloc(), so change that in arch/x86/kernel/microcode_amd.c::get_next_ucode(). Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-10 14:48:57 +01:00
Kusanagi Kouichi	1f523bf367	x86, pvclock: Remove leftover scale_delta() function Commit 92580d64e16402762e2acc3022f065397c780425 ("x86: pvclock: Move scale_delta into common header") forgot to remove scale_delta. Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Cc: Zachary Amsden <zamsden@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Glauber Costa <glommer@redhat.com> LKML-Reference: <20101105110444.BAF6D6FC03B@msa105.auone-net.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:32:15 +01:00
Jesper Juhl	2a8dcbd6cd	x86, apic: Remove double #include Remove the second <asm/atomic.h> inclusion. Signed-off-by: Jesper Juhl <jj@chaosbits.net> LKML-Reference: <alpine.LNX.2.00.1011072253360.26247@swampdragon.chaosbits.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:21:16 +01:00
Jan Beulich	2f62bf7d23	x86: Adjust section annotations in AMD Fam10 MMCONF enabling code check_enable_amd_mmconf_dmi() gets called only for the BSP, hence everything hanging off of it can be __init*. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CD2DE1E0200007800020990@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:08:26 +01:00
Jack Steiner	62b0cfc240	x86, UV: Update node controller MMRs A new version of the SGI UV hub node controller is being developed. A few of the MMRs (control registers) that exist on the current hub no longer exist on the new hub. Fortunately, there are alternate MMRs that are are functionally equivalent and that exist on both hubs. This patch changes the UV code to use MMRs that exist in BOTH versions of the hub node controller. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20101106204056.GA27584@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 10:06:38 +01:00
Jesper Juhl	8e5e9521c1	x86: Remove unnecessary casts of void ptr returning alloc function return values The [vk][cmz]alloc(_node) family of functions return void pointers which it's completely unnecessary/pointless to cast to other pointer types since that happens implicitly. This patch removes such casts from arch/x86. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: trivial@kernel.org Cc: amd64-microcode@amd64.org Cc: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <alpine.LNX.2.00.1011082310220.23697@swampdragon.chaosbits.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-10 09:13:00 +01:00
Andi Kleen	0059b2436a	x86: Address gcc4.6 "set but not used" warnings in apic.h native_apic_msr_read() and x2apic_enabled() use rdmsr(msr, low, high), but only use the low part. gcc4.6 complains about this: .../apic.h:144:11: warning: variable 'high' set but not used [-Wunused-but-set-variable] rdmsr() is just a wrapper around rdmsrl() which splits the 64bit value into low and high, so using rdmsrl() directly solves this. [tglx: Changed the variables to u64 as suggested by Cyrill. It's less confusing and has no code impact as this is 64bit only anyway. Massaged changelog as well. ] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: x86@kernel.org Cc: Cyrill Gorcunov <gorcunov@gmail.com> LKML-Reference: <1289251229-19589-1-git-send-email-andi@firstfloor.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-09 18:40:30 +01:00
Jacob Pan	7f05dec3dd	x86: mrst: Parse SFI timer table for all timer configs Penwell has APB timer based watchdog timers, it requires platform code to parse SFI MTMR tables in order to claim its timer. This patch will always parse SFI MTMR regardless of system timer configuration choices. Otherwise, SFI MTMR table may not get parsed if running on Medfield with always-on local APIC timers and constant TSC. Watchdog timer driver will then not get a timer to use. Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: Alan Cox <alan@linux.intel.com> LKML-Reference: <20101109112800.20591.10802.stgit@localhost.localdomain> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-09 14:45:52 +01:00
Feng Tang	1da4b1c6a4	x86/mrst: Add SFI platform device parsing code SFI provides a series of tables. These describe the platform devices present including SPI and I²C devices, as well as various sensors, keypads and other glue as well as interfaces provided via the SCU IPC mechanism (intel_scu_ipc.c) This patch is a merge of the core elements and relevant fixes from the Intel development code by Feng, Alek, myself into a single coherent patch for upstream submission. It provides the needed infrastructure to register I2C, SPI and platform devices described by the tables, as well as handlers for some of the hardware already supported in kernel. The 0.8 firmware also provides GPIO tables. Devices are created at boot time or if they are SCU dependant at the point an SCU is discovered. The existing Linux device mechanisms will then handle the device binding. At an abstract level this is an SFI to Linux device translator. Device/platform specific setup/glue is in this file. This is done so that the drivers for the generic I²C and SPI bus devices remain cross platform as they should. (Updated from RFC version to correct the emc1403 name used by the firmware and a wrongly used #define) Signed-off-by: Alek Du <alek.du@linux.intel.com> LKML-Reference: <20101109112158.20013.6158.stgit@localhost.localdomain> [Clean ups, removal of 0.7 support] Signed-off-by: Feng Tang <feng.tang@linux.intel.com> [Clean ups] Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-09 14:45:52 +01:00
Jiri Slaby	07cf2a64c2	xen: fix memory leak in Xen PCI MSI/MSI-X allocator. Stanse found that xen_setup_msi_irqs leaks memory when xen_allocate_pirq fails. Free the memory in that fail path. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xensource.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org	2010-11-08 11:30:00 -05:00
Jan Kiszka	453d9c57e2	KVM: x86: Issue smp_call_function_many with preemption disabled smp_call_function_many is specified to be called only with preemption disabled. Fulfill this requirement. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-05 14:42:27 -02:00
Vasiliy Kulikov	97e69aa62f	KVM: x86: fix information leak to userland Structures kvm_vcpu_events, kvm_debugregs, kvm_pit_state2 and kvm_clock_data are copied to userland with some padding and reserved fields unitialized. It leads to leaking of contents of kernel stack memory. We have to initialize them to zero. In patch v1 Jan Kiszka suggested to fill reserved fields with zeros instead of memset'ting the whole struct. It makes sense as these fields are explicitly marked as padding. No more fields need zeroing. KVM-Stable-Tag. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-11-05 14:42:27 -02:00
Marcelo Tosatti	eb45fda45f	KVM: MMU: fix rmap_remove on non present sptes drop_spte should not attempt to rmap_remove a non present shadow pte. This fixes a BUG_ON seen on kvm-autotest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Lucas Meneghel Rodrigues <lmr@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-11-05 14:42:26 -02:00
Michael S. Tsirkin	edde99ce05	KVM: Write protect memory after slot swap I have observed the following bug trigger: 1. userspace calls GET_DIRTY_LOG 2. kvm_mmu_slot_remove_write_access is called and makes a page ro 3. page fault happens and makes the page writeable fault is logged in the bitmap appropriately 4. kvm_vm_ioctl_get_dirty_log swaps slot pointers a lot of time passes 5. guest writes into the page 6. userspace calls GET_DIRTY_LOG At point (5), bitmap is clean and page is writeable, thus, guest modification of memory is not logged and GET_DIRTY_LOG returns an empty bitmap. The rule is that all pages are either dirty in the current bitmap, or write-protected, which is violated here. It seems that just moving kvm_mmu_slot_remove_write_access down to after the slot pointer swap should fix this bug. KVM-Stable-Tag. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-11-05 14:42:25 -02:00
Uwe Kleine-König	b595076a18	tree-wide: fix comment/printk typos "gadget", "through", "command", "maintain", "maintain", "controller", "address", "between", "initiali[zs]e", "instead", "function", "select", "already", "equal", "access", "management", "hierarchy", "registration", "interest", "relative", "memory", "offset", "already", Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-11-01 15:38:34 -04:00
Rakib Mullick	cf38d0ba7e	x86, mm: Fix section mismatch in tlb.c Mark tlb_cpuhp_notify as __cpuinit. It's basically a callback function, which is called from __cpuinit init_smp_flash(). So - it's safe. We were warned by the following warning: WARNING: arch/x86/mm/built-in.o(.text+0x356d): Section mismatch in reference from the function tlb_cpuhp_notify() to the function .cpuinit.text:calculate_tlb_offset() The function tlb_cpuhp_notify() references the function __cpuinit calculate_tlb_offset(). This is often because tlb_cpuhp_notify lacks a __cpuinit annotation or the annotation of calculate_tlb_offset is wrong. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <AANLkTinWQRG=HA9uB3ad0KAqRRTinL6L_4iKgF84coph@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-01 10:09:07 +01:00
Linus Torvalds	f02a38d86a	Merge branches 'perf-fixes-for-linus' and 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: jump label: Add work around to i386 gcc asm goto bug x86, ftrace: Use safe noops, drop trap test jump_label: Fix unaligned traps on sparc. jump label: Make arch_jump_label_text_poke_early() optional jump label: Fix error with preempt disable holding mutex oprofile: Remove deprecated use of flush_scheduled_work() oprofile: Fix the hang while taking the cpu offline jump label: Fix deadlock b/w jump_label_mutex vs. text_mutex jump label: Fix module __init section race * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Check irq_remapped instead of remapping_enabled in destroy_irq()	2010-10-30 11:43:26 -07:00
Yinghai Lu	7b79462a20	x86: Check irq_remapped instead of remapping_enabled in destroy_irq() Russ Anderson reported: \| There is a regression that is causing a NULL pointer dereference \| in free_irte when shutting down xpc. git bisect narrowed it down \| to git commit d585d06(intr_remap: Simplify the code further), which \| changed free_irte(). Reverse applying the patch fixes the problem. We need to use irq_remapped() for each irq instead of checking only intr_remapping_enabled as there might be non remapped irqs even when remapping is enabled. [ tglx: use cfg instead of retrieving it again. Massaged changelog ] Reported-bisected-and-tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <4CCBD511.40607@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-30 10:28:31 +02:00
Linus Torvalds	2d10d8737c	Merge branches 'x86-fixes-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, alternative: Call stop_machine_text_poke() on all cpus x86-32: Restore irq stacks NUMA-aware allocations x86, memblock: Fix early_node_mem with big reserved region. * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, uv: More Westmere support on SGI UV x86, uv: Enable Westmere support on SGI UV	2010-10-29 18:58:00 -07:00
Jason Baron	404ba5d7bb	x86, alternative: Call stop_machine_text_poke() on all cpus Currently, text_poke_smp() passes a NULL as the third argument to __stop_machine(), which will only run stop_machine_text_poke() on 1 cpu. Change NULL -> cpu_online_mask, as stop_machine_text_poke() is intended to be run on all cpus. I actually didn't notice any problems with stop_machine_text_poke() only being called on 1 cpu, but found this via code inspection. Signed-off-by: Jason Baron <jbaron@redhat.com> LKML-Reference: <20101028152026.GB2875@redhat.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-29 16:42:58 -07:00
Ian Campbell	a2d771c036	xen: correct size of level2_kernel_pgt sizeof(pmd_t *) is 4 bytes on 32-bit PAE leading to an allocation of only 2048 bytes. The correct size is sizeof(pmd_t) giving us a full page allocation. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-29 12:23:57 -07:00
Linus Torvalds	1e431a9d64	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb,ppc: Individual register get/set for ppc kgdbts: prevent re-entry to kgdbts before it unregisters debug_core,x86,blackfin: Clean up hw debug disable API kdb: Fix early debugging crash regression kgdb,arm: fix register dump kdb: fix per_cpu command to remove supress mask kdb: Add kdb kernel module sample	2010-10-29 11:49:38 -07:00
Steven Rostedt	45f81b1c96	jump label: Add work around to i386 gcc asm goto bug On i386 (not x86_64) early implementations of gcc would have a bug with asm goto causing it to produce code like the following: (This was noticed by Peter Zijlstra) 56 pushl 0 67 nopl jmp 0x6f popl jmp 0x8c 6f mov test je 0x8c 8c mov call *(%esp) The jump added in the asm goto skipped over the popl that matched the pushl 0, which lead up to a quick crash of the system when the jump was enabled. The nopl is defined in the asm goto () statement and when tracepoints are enabled, the nop changes to a jump to the label that was specified by the asm goto. asm goto is suppose to tell gcc that the code in the asm might jump to an external label. Here gcc obviously fails to make that work. The bug report for gcc is here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226 The bug only appears on x86 when not compiled with -maccumulate-outgoing-args. This option is always set on x86_64 and it is also the work around for a function graph tracer i386 bug. (See commit: `746357d6a5`) This explains why the bug only showed up on i386 when function graph tracer was not enabled. This patch now adds a CONFIG_JUMP_LABEL option that is default off instead of using jump labels by default. When jump labels are enabled, the -maccumulate-outgoing-args will be used (causing a slightly larger kernel image on i386). This option will exist until we have a way to detect if the gcc compiler in use is safe to use on all configurations without the work around. Note, there exists such a test, but for now we will keep the enabling of jump label as a manual option. Archs that know the compiler is safe with asm goto, may choose to select JUMP_LABEL and enable it by default. Reported-by: Ingo Molnar <mingo@elte.hu> Cause-discovered-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Baron <jbaron@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: David Daney <ddaney@caviumnetworks.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: David Miller <davem@davemloft.net> Cc: Richard Henderson <rth@redhat.com> LKML-Reference: <1288028746.3673.11.camel@laptop> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-29 14:45:29 -04:00
Dongdong Deng	d7ba979d45	debug_core,x86,blackfin: Clean up hw debug disable API The kgdb_disable_hw_debug() was an architecture specific function for disabling all hardware breakpoints on a per cpu basis when entering the debug core. This patch will remove the weak function kdbg_disable_hw_debug() and change it into a call back which lives with the rest of hw breakpoint call backs in struct kgdb_arch. Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-10-29 13:14:41 -05:00
H. Peter Anvin	2d1d7126bb	x86, ftrace: Use safe noops, drop trap test Always use a safe 5-byte noop sequence. Drop the trap test, since it is known to return false negatives on some virtualization platforms on 32 bits. The resulting code is both simpler and safer. Cc: Daniel Drake <dsd@laptop.org> Cc: Jason Baron <jbaron@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-29 13:07:59 -04:00
Eric Dumazet	5c1eb08936	x86-32: Restore irq stacks NUMA-aware allocations Commit `22d4cd4c4d` ("Allocate irq stacks seperate from percpu area") removed NUMA affinity of IRQ stacks as side-effect of the fix. Using alloc_pages_node() instead of __get_free_pages() is safe, even if the target node has no available LOWMEM pages : alloc_pages_node() fallbacks to another node. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Brian Gerst <brgerst@gmail.com> Cc: tj@kernel.org Cc: torvalds@linux-foundation.org Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <1288276854.2649.607.camel@edumazet-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-29 08:17:07 +02:00
Russ Anderson	0520bd8438	x86, uv: More Westmere support on SGI UV Enable Westmere support for all APIC modes on SGI UV. Signed-off-by: Russ Anderson <rja@sgi.com> LKML-Reference: <20101028224132.GB15804@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-28 22:38:07 -07:00
Linus Torvalds	18cb657ca1	Merge branch 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen and branch 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: register xen pci notifier xen: initialize cpu masks for pv guests in xen_smp_init xen: add a missing #include to arch/x86/pci/xen.c xen: mask the MTRR feature from the cpuid xen: make hvc_xen console work for dom0. xen: add the direct mapping area for ISA bus access xen: Initialize xenbus for dom0. xen: use vcpu_ops to setup cpu masks xen: map a dummy page for local apic and ioapic in xen_set_fixmap xen: remap MSIs into pirqs when running as initial domain xen: remap GSIs as pirqs when running as initial domain xen: introduce XEN_DOM0 as a silent option xen: map MSIs into pirqs xen: support GSI -> pirq remapping in PV on HVM guests xen: add xen hvm acpi_register_gsi variant acpi: use indirect call to register gsi in different modes xen: implement xen_hvm_register_pirq xen: get the maximum number of pirqs from xen xen: support pirq != irq * 'stable/xen-pcifront-0.8.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: (27 commits) X86/PCI: Remove the dependency on isapnp_disable. xen: Update Makefile with CONFIG_BLOCK dependency for biomerge.c MAINTAINERS: Add myself to the Xen Hypervisor Interface and remove Chris Wright. x86: xen: Sanitse irq handling (part two) swiotlb-xen: On x86-32 builts, select SWIOTLB instead of depending on it. MAINTAINERS: Add myself for Xen PCI and Xen SWIOTLB maintainer. xen/pci: Request ACS when Xen-SWIOTLB is activated. xen-pcifront: Xen PCI frontend driver. xenbus: prevent warnings on unhandled enumeration values xenbus: Xen paravirtualised PCI hotplug support. xen/x86/PCI: Add support for the Xen PCI subsystem x86: Introduce x86_msi_ops msi: Introduce default_[teardown\|setup]_msi_irqs with fallback. x86/PCI: Export pci_walk_bus function. x86/PCI: make sure _PAGE_IOMAP it set on pci mappings x86/PCI: Clean up pci_cache_line_size xen: fix shared irq device passthrough xen: Provide a variant of xen_poll_irq with timeout. xen: Find an unbound irq number in reverse order (high to low). xen: statically initialize cpu_evtchn_mask_p ... Fix up trivial conflicts in drivers/pci/Makefile	2010-10-28 17:11:17 -07:00
Linus Torvalds	51399a3919	Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (38 commits) kbuild: convert `arch/tile' to the kconfig mainmenu upgrade README: cite nconfig Revert "kconfig: Temporarily disable dependency warnings" kconfig: Use PATH_MAX instead of 128 for path buffer sizes. kconfig: Fix realloc usage() kconfig: Propagate const kconfig: Don't go out from read config loop when you read new symbol kconfig: fix menuconfig on debian lenny kbuild: migrate all arch to the kconfig mainmenu upgrade kconfig: expand file names kconfig: use the file's name of sourced file kconfig: constify file name kconfig: don't emit warning upon rootmenu's prompt redefinition kconfig: replace KERNELVERSION usage by the mainmenu's prompt kconfig: delay gconf window initialization kconfig: expand by default the rootmenu's prompt kconfig: add a symbol string expansion helper kconfig: regen parser kconfig: implement the `mainmenu' directive kconfig: allow PACKAGE to be defined on the compiler's command-line ... Fix up trivial conflict in arch/mn10300/Kconfig	2010-10-28 16:16:39 -07:00
Yinghai Lu	419db274be	x86, memblock: Fix early_node_mem with big reserved region. Xen can reserve huge amounts of memory for pre-ballooning, but that still shows as RAM in the e820 memory map. early_node_mem could not find range because of start/end adjusting, and will go through the fallback path. However, the fallback patch is still using memblock_x86_find_range_node(), and it is partially top-down because it go through active_range entries from low to high. Let's use memblock_find_in_range instead memblock_x86_find_range_node. So get real top down in fallback path. We may still need to make memblock_x86_find_range_node to do overall top_down work. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CC9A9C9.8020700@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-28 15:52:36 -07:00
Linus Torvalds	2d3b07c07b	Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Move olpc to platform x86: Move uv to platform x86: Move mrst to platform x86: Move scx200 to platform x86: Move visws to platform x86: Move efi to platform x86: Move sfi to platform x86: Add platform directory	2010-10-28 12:25:42 -07:00
Linus Torvalds	e9f29c9a56	Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (27 commits) x86: allocate space within a region top-down x86: update iomem_resource end based on CPU physical address capabilities x86/PCI: allocate space from the end of a region, not the beginning PCI: allocate bus resources from the top down resources: support allocating space within a region from the top down resources: handle overflow when aligning start of available area resources: ensure callback doesn't allocate outside available space resources: factor out resource_clip() to simplify find_resource() resources: add a default alignf to simplify find_resource() x86/PCI: MMCONFIG: fix region end calculation PCI: Add support for polling PME state on suspended legacy PCI devices PCI: Export some PCI PM functionality PCI: fix message typo PCI: log vendor/device ID always PCI: update Intel chipset names and defines PCI: use new ccflags variable in Makefile PCI: add PCI_MSIX_TABLE/PBA defines PCI: add PCI vendor id for STmicroelectronics x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs PCI: OLPC: Only enable PCI configuration type override on XO-1 ...	2010-10-28 11:59:52 -07:00
Linus Torvalds	a042e26137	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits) perf python scripting: Add futex-contention script perf python scripting: Fixup cut'n'paste error in sctop script perf scripting: Shut up 'perf record' final status perf record: Remove newline character from perror() argument perf python scripting: Support fedora 11 (audit 1.7.17) perf python scripting: Improve the syscalls-by-pid script perf python scripting: print the syscall name on sctop perf python scripting: Improve the syscalls-counts script perf python scripting: Improve the failed-syscalls-by-pid script kprobes: Remove redundant text_mutex lock in optimize x86/oprofile: Fix uninitialized variable use in debug printk tracing: Fix 'faild' -> 'failed' typo perf probe: Fix format specified for Dwarf_Off parameter perf trace: Fix detection of script extension perf trace: Use $PERF_EXEC_PATH in canned report scripts perf tools: Document event modifiers perf tools: Remove direct slang.h include perf_events: Fix for transaction recovery in group_sched_in() perf_events: Revert: Fix transaction recovery in group_sched_in() perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations ...	2010-10-27 18:48:00 -07:00
Linus Torvalds	17bb51d56c	Merge branch 'akpm-incoming-2' * akpm-incoming-2: (139 commits) epoll: make epoll_wait() use the hrtimer range feature select: rename estimate_accuracy() to select_estimate_accuracy() Remove duplicate includes from many files ramoops: use the platform data structure instead of module params kernel/resource.c: handle reinsertion of an already-inserted resource kfifo: fix kfifo_alloc() to return a signed int value w1: don't allow arbitrary users to remove w1 devices alpha: remove dma64_addr_t usage mips: remove dma64_addr_t usage sparc: remove dma64_addr_t usage fuse: use release_pages() taskstats: use real microsecond granularity for CPU times taskstats: split fill_pid function taskstats: separate taskstats commands delayacct: align to 8 byte boundary on 64-bit systems delay-accounting: reimplement -c for getdelays.c to report information on a target command namespaces Kconfig: move namespace menu location after the cgroup namespaces Kconfig: remove the cgroup device whitelist experimental tag namespaces Kconfig: remove pointless cgroup dependency namespaces Kconfig: make namespace a submenu ...	2010-10-27 18:42:52 -07:00
Linus Torvalds	0671b7674f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: percpu: Remove the multi-page alignment facility x86-32: Allocate irq stacks seperate from percpu area x86-32, mm: Remove duplicated #include x86, printk: Get rid of <0> from stack output x86, kexec: Make sure to stop all CPUs before exiting the kernel x86/vsmp: Eliminate kconfig dependency warning	2010-10-27 18:38:55 -07:00
Zimny Lech	61d8e11e51	Remove duplicate includes from many files Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:18 -07:00
Namhyung Kim	eb5a369931	ptrace: cleanup arch_ptrace() on x86 Remove checking @addr less than 0 because @addr is now unsigned and use new udescp variable in order to remove unnecessary castings. [akpm@linux-foundation.org: fix unused variable 'udescp'] Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:10 -07:00
Namhyung Kim	9b05a69e05	ptrace: change signature of arch_ptrace() Fix up the arguments to arch_ptrace() to take account of the fact that @addr and @data are now unsigned long rather than long as of a preceding patch in this series. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: <linux-arch@vger.kernel.org> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:10 -07:00
Peter Zijlstra	20273941f2	mm: fix race in kunmap_atomic() Christoph reported a nice splat which illustrated a race in the new stack based kmap_atomic implementation. The problem is that we pop our stack slot before we're completely done resetting its state -- in particular clearing the PTE (sometimes that's CONFIG_DEBUG_HIGHMEM). If an interrupt happens before we actually clear the PTE used for the last slot, that interrupt can reuse the slot in a dirty state, which triggers a BUG in kmap_atomic(). Fix this by introducing kmap_atomic_idx() which reports the current slot index without actually releasing it and use that to find the PTE and delay the _pop() until after we're completely done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Christoph Hellwig <hch@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:05 -07:00
Brian Gerst	22d4cd4c4d	x86-32: Allocate irq stacks seperate from percpu area The percpu allocator cannot handle alignments larger than one page. Allocate the irq stacks seperately, and only keep the pointers as percpu data. Signed-off-by: Brian Gerst <brgerst@gmail.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: tj@kernel.org LKML-Reference: <1288158182-1753-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-27 17:31:42 +02:00
Thomas Gleixner	8654b1c2de	x86: Move olpc to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andres Salomon <dilinger@queued.net>	2010-10-27 17:22:16 +02:00
Thomas Gleixner	329b84e42e	x86: Move uv to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mike Travis <travis@sgi.com>	2010-10-27 14:30:02 +02:00
Thomas Gleixner	9694d4afc1	x86: Move mrst to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jacob Pan <jacob.jun.pan@intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	3b3da9d25a	x86: Move scx200 to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	c4e72ad6bb	x86: Move visws to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	b17ed48040	x86: Move efi to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Huang Ying <ying.huang@intel.com>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	937f961a65	x86: Move sfi to platform Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>	2010-10-27 14:30:01 +02:00
Thomas Gleixner	3adbb7f4a3	x86: Add platform directory x86 has finally arrived in the embedded nightmare and will rapidly grow SoC platform support in various flavours. So we need a place for the platform support files. That also allows us to clean up the dumpground which arch/x86/kernel has become over time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-27 14:30:01 +02:00
Linus Torvalds	520045db94	Merge branches 'upstream/xenfs' and 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen * 'upstream/xenfs' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: xen/privcmd: make privcmd visible in domU xen/privcmd: move remap_domain_mfn_range() to core xen code and export. privcmd: MMAPBATCH: Fix error handling/reporting xenbus: export xen_store_interface for xenfs xen/privcmd: make sure vma is ours before doing anything to it xen/privcmd: print SIGBUS faults xen/xenfs: set_page_dirty is supposed to return true if it dirties xen/privcmd: create address space to allow writable mmaps xen: add privcmd driver xen: add variable hypercall caller xen: add xen_set_domain_pte() xen: add /proc/xen/xsd_{kva,port} to xenfs * 'upstream/core' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (29 commits) xen: include xen/xen.h for definition of xen_initial_domain() xen: use host E820 map for dom0 xen: correctly rebuild mfn list list after migration. xen: improvements to VIRQ_DEBUG output xen: set up IRQ before binding virq to evtchn xen: ensure that all event channels start off bound to VCPU 0 xen/hvc: only notify if we actually sent something xen: don't add extra_pages for RAM after mem_end xen: add support for PAT xen: make sure xen_max_p2m_pfn is up to date xen: limit extra memory to a certain ratio of base xen: add extra pages for E820 RAM regions, even if beyond mem_end xen: make sure xen_extra_mem_start is beyond all non-RAM e820 xen: implement "extra" memory to reserve space for pages not present at boot xen: Use host-provided E820 map xen: don't map missing memory xen: defer building p2m mfn structures until kernel is mapped xen: add return value to set_phys_to_machine() xen: convert p2m to a 3 level tree xen: make install_p2mtop_page() static ... Fix up trivial conflict in arch/x86/xen/mmu.c, and fix the use of 'reserve_early()' - in the new memblock world order it is now 'memblock_x86_reserve_range()' instead. Pointed out by Jeremy.	2010-10-26 18:20:19 -07:00
Linus Torvalds	474829e875	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (53 commits) ACPI: install ACPI table handler before any dynamic tables being loaded ACPI / PM: Blacklist another machine that needs acpi_sleep=nonvs ACPI: Page based coalescing of I/O remappings optimization ACPI: Convert simple locking to RCU based locking ACPI: Pre-map 'system event' related register blocks ACPI: Add interfaces for ioremapping/iounmapping ACPI registers ACPI: Maintain a list of ACPI memory mapped I/O remappings ACPI: Fix ioremap size for MMIO reads and writes ACPI / Battery: Return -ENODEV for unknown values in get_property() ACPI / PM: Fix reference counting of power resources Subject: [PATCH] ACPICA: Fix Scope() op in module level code ACPI battery: support percentage battery remaining capacity ACPI: Make Embedded Controller command timeout delay configurable ACPI dock: move some functions to .init.text ACPI: thermal: remove unused limit code ACPI: static sleep_states[] and acpi_gts_bfs_check ACPI: remove dead code ACPI: delete dedicated MAINTAINERS entries for ACPI EC and BATTERY drivers ACPI: Only processor needs CPU_IDLE ACPICA: Update version to 20101013 ...	2010-10-26 17:28:37 -07:00
Andrew Morton	ca1cab37d9	workqueues: s/ON_STACK/ONSTACK/ Silly though it is, completions and wait_queue_heads use foo_ONSTACK (COMPLETION_INITIALIZER_ONSTACK, DECLARE_COMPLETION_ONSTACK, __WAIT_QUEUE_HEAD_INIT_ONSTACK and DECLARE_WAIT_QUEUE_HEAD_ONSTACK) so I guess workqueues should do the same thing. s/INIT_WORK_ON_STACK/INIT_WORK_ONSTACK/ s/INIT_DELAYED_WORK_ON_STACK/INIT_DELAYED_WORK_ONSTACK/ Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:14 -07:00
Hagen Paul Pfeifer	732eacc054	replace nested max/min macros with {max,min}3 macro Use the new {max,min}3 macros to save some cycles and bytes on the stack. This patch substitutes trivial nested macros with their counterpart. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Cc: Joe Perches <joe@perches.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Hartley Sweeten <hsweeten@visionengravers.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:12 -07:00
Michel Lespinasse	68da336a14	x86: access_error API cleanup access_error() already takes error_code as an argument, so there is no need for an additional write flag. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:09 -07:00
Michel Lespinasse	d065bd810b	mm: retry page fault when blocking on disk transfer This change reduces mmap_sem hold times that are caused by waiting for disk transfers when accessing file mapped VMAs. It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call site wants mmap_sem to be released if blocking on a pending disk transfer. In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and do_page_fault() will then re-acquire mmap_sem and retry the page fault. It is expected that the retry will hit the same page which will now be cached, and thus it will complete with a low mmap_sem hold time. Tests: - microbenchmark: thread A mmaps a large file and does random read accesses to the mmaped area - achieves about 55 iterations/s. Thread B does mmap/munmap in a loop at a separate location - achieves 55 iterations/s before, 15000 iterations/s after. - We are seeing related effects in some applications in house, which show significant performance regressions when running without this change. [akpm@linux-foundation.org: fix warning & crash] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:09 -07:00
Peter Zijlstra	7a837d1bb7	perf, x86: Fix up kmap_atomic() type Now that the KM_type stuff is history, clean up the compiler warning. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Peter Zijlstra	ece0e2b640	mm: remove pte_map_nested() Since we no longer need to provide KM_type, the whole pte_map_nested() API is now redundant, remove it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Peter Zijlstra	3e4d3af501	mm: stack based kmap_atomic() Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Bjorn Helgaas	1af3c2e45e	x86: allocate space within a region top-down Request that allocate_resource() use available space from high addresses first, rather than the default of using low addresses first. The most common place this makes a difference is when we move or assign new PCI device resources. Low addresses are generally scarce, so it's better to use high addresses when possible. This follows Windows practice for PCI allocation. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c42 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-26 15:33:45 -07:00
Bjorn Helgaas	419afdf53c	x86: update iomem_resource end based on CPU physical address capabilities The iomem_resource map reflects the available physical address space. We statically initialize the end to -1, i.e., 0xffffffff_ffffffff, but of course we can only use as much as the CPU can address. This patch updates the end based on the CPU capabilities, so we don't mistakenly allocate space that isn't usable, as we're likely to do when allocating from the top-down. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-26 15:33:44 -07:00
Bjorn Helgaas	dc9887dc02	x86/PCI: allocate space from the end of a region, not the beginning Allocate from the end of a region, not the beginning. For example, if we need to allocate 0x800 bytes for a device on bus 0000:00 given these resources: [mem 0xbff00000-0xdfffffff] PCI Bus 0000:00 [mem 0xc0000000-0xdfffffff] PCI Bus 0000:02 the available space at [mem 0xbff00000-0xbfffffff] is passed to the alignment callback (pcibios_align_resource()). Prior to this patch, we would put the new 0x800 byte resource at the beginning of that available space, i.e., at [mem 0xbff00000-0xbff007ff]. With this patch, we put it at the end, at [mem 0xbffff800-0xbfffffff]. Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228#c41 Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-26 15:33:42 -07:00
Russ Anderson	c8f730b1ab	x86, uv: Enable Westmere support on SGI UV Enable Westmere support on SGI UV. The UV initialization code is dependent on the APICID bits. Westmere-EX uses different APIC bit mapping than Nehalem-EX. This code reads the apic shift value from a UV MMR to do the proper bit decoding to determint the pnode. Signed-off-by: Russ Anderson <rja@sgi.com> LKML-Reference: <20101026212728.GB15071@sgi.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-26 15:15:28 -07:00
Stefano Stabellini	ea5b8f7393	xen: initialize cpu masks for pv guests in xen_smp_init Pv guests don't have ACPI and need the cpu masks to be set correctly as early as possible so we call xen_fill_possible_map from xen_smp_init. On the other hand the initial domain supports ACPI so in this case we skip xen_fill_possible_map and rely on it. However Xen might limit the number of cpus usable by the domain, so we filter those masks during smp initialization using the VCPUOP_is_up hypercall. It is important that the filtering is done before xen_setup_vcpu_info_placement. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-10-26 20:33:15 +01:00
Linus Torvalds	f1ebdd60cc	Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits) Add _addr_lsb field to ia64 siginfo Fix migration.c compilation on s390 HWPOISON: Remove retry loop for try_to_unmap HWPOISON: Turn addr_valid from bitfield into char HWPOISON: Disable DEBUG by default HWPOISON: Convert pr_debugs to pr_info HWPOISON: Improve comments in memory-failure.c x86: HWPOISON: Report correct address granuality for huge hwpoison faults Encode huge page size for VM_FAULT_HWPOISON errors Fix build error with !CONFIG_MIGRATION hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Clean up __page_set_anon_rmap HWPOISON, hugetlb: fix unpoison for hugepage HWPOISON, hugetlb: soft offlining for hugepage HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED hugetlb: move refcounting in hugepage allocation inside hugetlb_lock HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() hugetlb: hugepage migration core hugetlb: redefine hugepage copy functions hugetlb: add allocate function for hugepage migration ...	2010-10-26 10:13:10 -07:00
Linus Torvalds	4f6876031e	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit. [CPUFREQ] add sampling_down_factor tunable to improve ondemand performance [CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type [CPUFREQ] drivers/cpufreq: Adjust confusing if indentation	2010-10-26 10:00:04 -07:00
Ian Campbell	45263cb099	xen: include xen/xen.h for definition of xen_initial_domain() CC arch/x86/xen/setup.o arch/x86/xen/setup.c: In function 'xen_memory_setup': arch/x86/xen/setup.c:161: error: implicit declaration of function 'xen_initial_domain' Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-25 16:32:48 -07:00
Borislav Petkov	610470ce80	x86-32, mm: Remove duplicated #include `b40827fa72` added an include directive which is needless and is taken care of by a previous one. Remove it. Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <20101025162523.GA4712@a1.tnic> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 19:39:33 +02:00
Ingo Molnar	7d7a48b760	Merge branch 'linus' into x86/urgent Merge reason: We want to queue up a dependent fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 19:38:52 +02:00
Ingo Molnar	0b849ee888	Merge branch 'x86' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent	2010-10-25 19:17:32 +02:00
Borislav Petkov	9afd281a15	x86-32, mm: Remove duplicated include Commit `b40827fa72` ("x86-32, mm: Add an initial page table for core bootstrapping") added an include directive which is needless and is taken care of by a previous one. Remove it. Caught-by: Jaswinder Singh Rajput <jaswinderlinux@gmail.com> Signed-off-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-25 10:05:13 -07:00
Robert Richter	eb48c9cb20	apic, amd: Make firmware bug messages more meaningful This improves error messages in case the BIOS was setting up wrong LVT offsets. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-6-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:43 +02:00
Robert Richter	0a17941e71	mce, amd: Remove goto in threshold_create_device() Removing the goto in threshold_create_device(). Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-5-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:43 +02:00
Robert Richter	bbaff08dca	mce, amd: Add helper functions to setup APIC This patch reworks and cleans up mce_amd_feature_init() by introducing helper functions to setup and check the LVT offset. It also fixes line endings in pr_err() calls. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-4-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:43 +02:00
Robert Richter	7203a04940	mce, amd: Shorten local variables mci_misc_{hi,lo} Shorten this variables to make later changes more readable. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:42 +02:00
Robert Richter	9c37c9d897	mce, amd: Implement mce_threshold_block_init() helper function This patch adds a helper function for the initial setup of an mce threshold block. The LVT offset is passed as argument. Also making variable threshold_defaults local as it is only used in function mce_amd_feature_init(). Function threshold_restart_bank() is extended to setup the LVT offset, the change is backward compatible. Thus, now there is only a single wrmsrl() to setup the block. Signed-off-by: Robert Richter <robert.richter@amd.com> Acked-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <1288015419-29543-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 18:59:42 +02:00
Linus Torvalds	fbaab1dc19	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (44 commits) eeepc-wmi: Add cpufv sysfs interface eeepc-wmi: add additional hotkeys panasonic-laptop: Simplify calls to acpi_pcc_retrieve_biosdata panasonic-laptop: Handle errors properly if they happen intel_pmic_gpio: fix off-by-one value range checking IBM Real-Time "SMI Free" mode driver -v7 Add OLPC XO-1 rfkill driver Move hdaps driver to platform/x86 ideapad-laptop: Fix Makefile intel_pmic_gpio: swap the bits and mask args for intel_scu_ipc_update_register ideapad: Add param: no_bt_rfkill ideapad: Change the driver name to ideapad-laptop ideapad: rewrite the sw rfkill set ideapad: rewrite the hw rfkill notify ideapad: use EC command to control camera ideapad: use return value of _CFG to tell if device exist or not ideapad: make sure we bind on the correct device ideapad: check VPC bit before sync rfkill hw status ideapad: add ACPI helpers dell-laptop: Add debugfs support ...	2010-10-25 08:28:13 -07:00
Robert Richter	4cafc4b8d7	Merge branch 'oprofile/core' into oprofile/x86 Conflicts: arch/x86/oprofile/op_model_amd.c Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-25 16:58:34 +02:00
Ingo Molnar	2c78ffeca9	x86/oprofile: Fix uninitialized variable use in debug printk Stephen Rothwell reported this build warning: arch/x86/oprofile/op_model_amd.c: In function 'ibs_eilvt_valid': arch/x86/oprofile/op_model_amd.c:289: warning: 'offset' may be used uninitialized in this function And correctly observed that indeed the variable is used uninitialized in this function. The result of this bug can be a debug printk with a bogus value. Also fix a few more small details that made this function hard to read and which probably contributed to the bug being introduced to begin with: - Use more symmetric error conditions - Remove the !0 obfuscation - Add newlines to the printk output - Remove bogus linebreaks in printk strings and elsewhere Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Robert Richter <robert.richter@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20101025115736.41d51abe.sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-25 08:46:20 +02:00
Len Brown	38add9b4ba	Merge branches 'bugzilla-15807', 'bugzilla-15979-v2' and 'bugzilla-19162' into release	2010-10-25 02:12:27 -04:00
Linus Torvalds	229aebb873	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits) Update broken web addresses in arch directory. Update broken web addresses in the kernel. Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget Revert "Fix typo: configuation => configuration" partially ida: document IDA_BITMAP_LONGS calculation ext2: fix a typo on comment in ext2/inode.c drivers/scsi: Remove unnecessary casts of private_data drivers/s390: Remove unnecessary casts of private_data net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data drivers/infiniband: Remove unnecessary casts of private_data drivers/gpu/drm: Remove unnecessary casts of private_data kernel/pm_qos_params.c: Remove unnecessary casts of private_data fs/ecryptfs: Remove unnecessary casts of private_data fs/seq_file.c: Remove unnecessary casts of private_data arm: uengine.c: remove C99 comments arm: scoop.c: remove C99 comments Fix typo configue => configure in comments Fix typo: configuation => configuration Fix typo interrest[ing\|ed] => interest[ing\|ed] Fix various typos of valid in comments ... Fix up trivial conflicts in: drivers/char/ipmi/ipmi_si_intf.c drivers/usb/gadget/rndis.c net/irda/irnet/irnet_ppp.c	2010-10-24 13:41:39 -07:00
Linus Torvalds	1765a1fe5d	Merge branch 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (321 commits) KVM: Drop CONFIG_DMAR dependency around kvm_iommu_map_pages KVM: Fix signature of kvm_iommu_map_pages stub KVM: MCE: Send SRAR SIGBUS directly KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED KVM: fix typo in copyright notice KVM: Disable interrupts around get_kernel_ns() KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address KVM: MMU: move access code parsing to FNAME(walk_addr) function KVM: MMU: audit: check whether have unsync sps after root sync KVM: MMU: audit: introduce audit_printk to cleanup audit code KVM: MMU: audit: unregister audit tracepoints before module unloaded KVM: MMU: audit: fix vcpu's spte walking KVM: MMU: set access bit for direct mapping KVM: MMU: cleanup for error mask set while walk guest page table KVM: MMU: update 'root_hpa' out of loop in PAE shadow path KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() KVM: x86: Fix constant type in kvm_get_time_scale KVM: VMX: Add AX to list of registers clobbered by guest switch KVM guest: Move a printk that's using the clock before it's ready KVM: x86: TSC catchup mode ...	2010-10-24 12:47:25 -07:00
Thomas Gleixner	7fb2b870d6	x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage Stupid me forgot to change the function name for the CONFIG_X86_IO_APIC=n case in commit `23f9b2671` (x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings()) Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-24 11:14:14 +02:00
Huang Ying	77db5cbd29	KVM: MCE: Send SRAR SIGBUS directly Originally, SRAR SIGBUS is sent to QEMU-KVM via touching the poisoned page. But commit `9605456919` prevents the signal from being sent. So now the signal is sent via force_sig_info_fault directly. [marcelo: use send_sig_info instead] Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:15 +02:00
Huang Ying	5854dbca9b	KVM: MCE: Add MCG_SER_P into KVM_MCE_CAP_SUPPORTED Now we have MCG_SER_P (and corresponding SRAO/SRAR MCE) support in kernel and QEMU-KVM, the MCG_SER_P should be added into KVM_MCE_CAP_SUPPORTED to make all these code really works. Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:15 +02:00
Nicolas Kaiser	9611c18777	KVM: fix typo in copyright notice Fix typo in copyright notice. Signed-off-by: Nicolas Kaiser <nikai@nikai.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:14 +02:00
Avi Kivity	395c6b0a9d	KVM: Disable interrupts around get_kernel_ns() get_kernel_ns() wants preemption disabled. It doesn't make a lot of sense during the get/set ioctls (no way to make them non-racy) but the callee wants it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Avi Kivity	7ebaf15eef	KVM: MMU: Avoid sign extension in mmu_alloc_direct_roots() pae root address Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	3377078027	KVM: MMU: move access code parsing to FNAME(walk_addr) function Move access code parsing from caller site to FNAME(walk_addr) function Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	6903074c36	KVM: MMU: audit: check whether have unsync sps after root sync After root synced, all unsync sps are synced, this patch add a check to make sure it's no unsync sps in VCPU's page table Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:14 +02:00
Xiao Guangrong	38904e1287	KVM: MMU: audit: introduce audit_printk to cleanup audit code Introduce audit_printk, and record audit point instead audit name Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:13 +02:00
Xiao Guangrong	c42fffe3a3	KVM: MMU: audit: unregister audit tracepoints before module unloaded fix: Call Trace: [<ffffffffa01e46ba>] ? kvm_mmu_pte_write+0x229/0x911 [kvm] [<ffffffffa01c6ba9>] ? gfn_to_memslot+0x39/0xa0 [kvm] [<ffffffffa01c6c26>] ? mark_page_dirty+0x16/0x2e [kvm] [<ffffffffa01c6d6f>] ? kvm_write_guest_page+0x67/0x7f [kvm] [<ffffffff81066fbd>] ? local_clock+0x2a/0x3b [<ffffffffa01d52ce>] emulator_write_phys+0x46/0x54 [kvm] ...... Code: Bad RIP value. RIP [<ffffffffa0172056>] 0xffffffffa0172056 RSP <ffff880134f69a70> CR2: ffffffffa0172056 Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:13 +02:00
Xiao Guangrong	98224bf1d1	KVM: MMU: audit: fix vcpu's spte walking After nested nested paging, it may using long mode to shadow 32/PAE paging guest, so this patch fix it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:12 +02:00
Xiao Guangrong	33f91edb92	KVM: MMU: set access bit for direct mapping Set access bit while setup up direct page table if it's nonpaing or npt enabled, it's good for CPU's speculate access Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:11 +02:00
Xiao Guangrong	20bd40dc64	KVM: MMU: cleanup for error mask set while walk guest page table Small cleanup for set page fault error code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:10 +02:00
Xiao Guangrong	6292757fb0	KVM: MMU: update 'root_hpa' out of loop in PAE shadow path The value of 'vcpu->arch.mmu.pae_root' is not modified, so we can update 'root_hpa' out of the loop. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:09 +02:00
Sheng Yang	7129eecac1	KVM: x86 emulator: Eliminate compilation warning in x86_decode_insn() Eliminate: arch/x86/kvm/emulate.c:801: warning: ‘sv’ may be used uninitialized in this function on gcc 4.1.2 Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:09 +02:00
Jan Kiszka	50933623e5	KVM: x86: Fix constant type in kvm_get_time_scale Older gcc versions complain about the improper type (for x86-32), 4.5 seems to fix this silently. However, we should better use the right type initially. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:08 +02:00
Jan Kiszka	07d6f555d5	KVM: VMX: Add AX to list of registers clobbered by guest switch By chance this caused no harm so far. We overwrite AX during switch to/from guest context, so we must declare this. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:07 +02:00
Arjan Koers	19b6a85b78	KVM guest: Move a printk that's using the clock before it's ready Fix a hang during SMP kernel boot on KVM that showed up after commit `489fb490db` (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0 (2.6.34.1). The problem only occurs when CONFIG_PRINTK_TIME is set. KVM-Stable-Tag. Signed-off-by: Arjan Koers <0h61vkll2ly8@xutrox.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:53:06 +02:00
Zachary Amsden	c285545f81	KVM: x86: TSC catchup mode Negate the effects of AN TYM spell while kvm thread is preempted by tracking conversion factor to the highest TSC rate and catching the TSC up when it has fallen behind the kernel view of time. Note that once triggered, we don't turn off catchup mode. A slightly more clever version of this is possible, which only does catchup when TSC rate drops, and which specifically targets only CPUs with broken TSC, but since these all are considered unstable_tsc(), this patch covers all necessary cases. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:05 +02:00
Zachary Amsden	34c238a1d1	KVM: x86: Rename timer function This just changes some names to better reflect the usage they will be given. Separated out to keep confusion to a minimum. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:05 +02:00
Zachary Amsden	5f4e3f8827	KVM: x86: Make math work for other scales The math in kvm_get_time_scale relies on the fact that NSEC_PER_SEC < 2^32. To use the same function to compute arbitrary time scales, we must extend the first reduction step to shrink the base rate to a 32-bit value, and possibly reduce the scaled rate into a 32-bit as well. Note we must take care to avoid an arithmetic overflow when scaling up the tps32 value (this could not happen with the fixed scaled value of NSEC_PER_SEC, but can happen with scaled rates above 2^31. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:04 +02:00
Avi Kivity	49e9d557f9	KVM: VMX: Respect interrupt window in big real mode If an interrupt is pending, we need to stop emulation so we can inject it. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:02 +02:00
Mohammed Gamal	a92601bb70	KVM: VMX: Emulated real mode interrupt injection Replace the inject-as-software-interrupt hack we currently have with emulated injection. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:01 +02:00
Mohammed Gamal	63995653ad	KVM: Add kvm_inject_realmode_interrupt() wrapper This adds a wrapper function kvm_inject_realmode_interrupt() around the emulator function emulate_int_real() to allow real mode interrupt injection. [avi: initialize operand and address sizes before emulating interrupts] [avi: initialize rip for real mode interrupt injection] [avi: clear interrupt pending flag after emulating interrupt injection] Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:01 +02:00
Mohammed Gamal	4ab8e02404	KVM: x86 emulator: Expose emulate_int_real() Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:53:00 +02:00
Hillf Danton	cb16a7b387	KVM: MMU: fix counting of rmap entries in rmap_add() It seems that rmap entries are under counted. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:59 +02:00
Gleb Natapov	a0a07cd2c5	KVM: SVM: do not generate "external interrupt exit" if other exit is pending Nested SVM checks for external interrupt after injecting nested exception. In case there is external interrupt pending the code generates "external interrupt exit" and overwrites previous exit info. If previously injected exception already generated exit it will be lost. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Avi Kivity	f4f5105087	KVM: Convert PIC lock from raw spinlock to ordinary spinlock The PIC code used to be called from preempt_disable() context, which wasn't very good for PREEMPT_RT. That is no longer the case, so move back from raw_spinlock_t to spinlock_t. Signed-off-by: Avi Kivity <avi@redhat.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Zachary Amsden	28e4639adf	KVM: x86: Fix kvmclock bug If preempted after kvmclock values are updated, but before hardware virtualization is entered, the last tsc time as read by the guest is never set. It underflows the next time kvmclock is updated if there has not yet been a successful entry / exit into hardware virt. Fix this by simply setting last_tsc to the newly read tsc value so that any computed nsec advance of kvmclock is nulled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:56 +02:00
Joerg Roedel	0959ffacf3	KVM: MMU: Don't track nested fault info in error-code This patch moves the detection whether a page-fault was nested or not out of the error code and moves it into a separate variable in the fault struct. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:55 +02:00
Avi Kivity	625831a3f4	KVM: VMX: Move fixup_rmode_irq() to avoid forward declaration No code changes. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	b463a6f744	KVM: Non-atomic interrupt injection Change the interrupt injection code to work from preemptible, interrupts enabled context. This works by adding a ->cancel_injection() operation that undoes an injection in case we were not able to actually enter the guest (this condition could never happen with atomic injection). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:54 +02:00
Avi Kivity	83422e17c1	KVM: VMX: Parameterize vmx_complete_interrupts() for both exit and entry Currently vmx_complete_interrupts() can decode event information from vmx exit fields into the generic kvm event queues. Make it able to decode the information from the entry fields as well by parametrizing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:52 +02:00
Avi Kivity	537b37e267	KVM: VMX: Move real-mode interrupt injection fixup to vmx_complete_interrupts() This allows reuse of vmx_complete_interrupts() for cancelling injections. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:51 +02:00
Avi Kivity	51aa01d13d	KVM: VMX: Split up vmx_complete_interrupts() vmx_complete_interrupts() does too much, split it up: - vmx_vcpu_run() gets the "cache important vmcs fields" part - a new vmx_complete_atomic_exit() gets the parts that must be done atomically - a new vmx_recover_nmi_blocking() does what its name says - vmx_complete_interrupts() retains the event injection recovery code This helps in reducing the work done in atomic context. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:51 +02:00
Avi Kivity	3842d135ff	KVM: Check for pending events before attempting injection Instead of blindly attempting to inject an event before each guest entry, check for a possible event first in vcpu->requests. Sites that can trigger event injection are modified to set KVM_REQ_EVENT: - interrupt, nmi window opening - ppr updates - i8259 output changes - local apic irr changes - rflags updates - gif flag set - event set on exit This improves non-injecting entry performance, and sets the stage for non-atomic injection. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:50 +02:00
Avi Kivity	b0bc3ee2b5	KVM: MMU: Fix regression with ept memory types merged into non-ept page tables Commit "KVM: MMU: Make tdp_enabled a mmu-context parameter" made real-mode set ->direct_map, and changed the code that merges in the memory type depend on direct_map instead of tdp_enabled. However, in this case what really matters is tdp, not direct_map, since tdp changes the pte format regardless of whether the mapping is direct or not. As a result, real-mode shadow mappings got corrupted with ept memory types. The result was a huge slowdown, likely due to the cache being disabled. Change it back as the simplest fix for the regression (real fix is to move all that to vmx code, and not use tdp_enabled as a synonym for ept). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:49 +02:00
Joerg Roedel	4c62a2dc92	KVM: X86: Report SVM bit to userspace only when supported This patch fixes a bug in KVM where it _always_ reports the support of the SVM feature to userspace. But KVM only supports SVM on AMD hardware and only when it is enabled in the kernel module. This patch fixes the wrong reporting. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:48 +02:00
Joerg Roedel	3d4aeaad8b	KVM: SVM: Report Nested Paging support to userspace This patch implements the reporting of the nested paging feature support to userspace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:47 +02:00
Joerg Roedel	55c5e464fc	KVM: SVM: Expect two more candiates for exit_int_info This patch adds INTR and NMI intercepts to the list of expected intercepts with an exit_int_info set. While this can't happen on bare metal it is architectural legal and may happen with KVMs SVM emulation. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:46 +02:00
Joerg Roedel	4b16184c1c	KVM: SVM: Initialize Nested Nested MMU context on VMRUN This patch adds code to initialize the Nested Nested Paging MMU context when the L1 guest executes a VMRUN instruction and has nested paging enabled in its VMCB. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:46 +02:00
Joerg Roedel	5bd2edc341	KVM: SVM: Implement MMU helper functions for Nested Nested Paging This patch adds the helper functions which will be used in the mmu context for handling nested nested page faults. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:45 +02:00
Joerg Roedel	2d48a985c7	KVM: MMU: Track NX state in struct kvm_mmu With Nested Paging emulation the NX state between the two MMU contexts may differ. To make sure that always the right fault error code is recorded this patch moves the NX state into struct kvm_mmu so that the code can distinguish between L1 and L2 NX state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:44 +02:00
Joerg Roedel	81407ca553	KVM: MMU: Allow long mode shadows for legacy page tables Currently the KVM softmmu implementation can not shadow a 32 bit legacy or PAE page table with a long mode page table. This is a required feature for nested paging emulation because the nested page table must alway be in host format. So this patch implements the missing pieces to allow long mode page tables for page table types. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:43 +02:00
Joerg Roedel	651dd37a9c	KVM: MMU: Refactor mmu_alloc_roots function This patch factors out the direct-mapping paths of the mmu_alloc_roots function into a seperate function. This makes it a lot easier to avoid all the unnecessary checks done in the shadow path which may break when running direct. In fact, this patch already fixes a problem when running PAE guests on a PAE shadow page table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:42 +02:00
Joerg Roedel	d41d1895eb	KVM: MMU: Introduce kvm_pdptr_read_mmu This function is implemented to load the pdptr pointers of the currently running guest (l1 or l2 guest). Therefore it takes care about the current paging mode and can read pdptrs out of l2 guest physical memory. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:42 +02:00
Joerg Roedel	ff03a073e7	KVM: MMU: Add kvm_mmu parameter to load_pdptrs function This function need to be able to load the pdptrs from any mmu context currently in use. So change this function to take an kvm_mmu parameter to fit these needs. As a side effect this patch also moves the cached pdptrs from vcpu_arch into the kvm_mmu struct. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:41 +02:00
Joerg Roedel	d47f00a62b	KVM: X86: Propagate fetch faults KVM currently ignores fetch faults in the instruction emulator. With nested-npt we could have such faults. This patch adds the code to handle these. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:41 +02:00
Joerg Roedel	d4f8cf664e	KVM: MMU: Propagate the right fault back to the guest after gva_to_gpa This patch implements logic to make sure that either a page-fault/page-fault-vmexit or a nested-page-fault-vmexit is propagated back to the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:40 +02:00
Joerg Roedel	02f59dc9f1	KVM: MMU: Introduce init_kvm_nested_mmu() This patch introduces the init_kvm_nested_mmu() function which is used to re-initialize the nested mmu when the l2 guest changes its paging mode. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:39 +02:00
Joerg Roedel	3d06b8bfd4	KVM: MMU: Introduce kvm_read_nested_guest_page() This patch introduces the kvm_read_guest_page_x86 function which reads from the physical memory of the guest. If the guest is running in guest-mode itself with nested paging enabled it will read from the guest's guest physical memory instead. The patch also changes changes the code to use this function where it is necessary. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:38 +02:00
Joerg Roedel	2329d46d21	KVM: MMU: Make walk_addr_generic capable for two-level walking This patch uses kvm_read_guest_page_tdp to make the walk_addr_generic functions suitable for two-level page table walking. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:38 +02:00
Joerg Roedel	ec92fe44e7	KVM: X86: Add kvm_read_guest_page_mmu function This patch adds a function which can read from the guests physical memory or from the guest's guest physical memory. This will be used in the two-dimensional page table walker. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:37 +02:00
Joerg Roedel	6539e738f6	KVM: MMU: Implement nested gva_to_gpa functions This patch adds the functions to do a nested l2_gva to l1_gpa page table walk. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:36 +02:00
Joerg Roedel	14dfe855f9	KVM: X86: Introduce pointer to mmu context used for gva_to_gpa This patch introduces the walk_mmu pointer which points to the mmu-context currently used for gva_to_gpa translations. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:35 +02:00
Joerg Roedel	c30a358d33	KVM: MMU: Add infrastructure for two-level page walker This patch introduces a mmu-callback to translate gpa addresses in the walk_addr code. This is later used to translate l2_gpa addresses into l1_gpa addresses. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:34 +02:00
Joerg Roedel	1e301feb07	KVM: MMU: Introduce generic walk_addr function This is the first patch in the series towards a generic walk_addr implementation which could walk two-dimensional page tables in the end. In this first step the walk_addr function is renamed into walk_addr_generic which takes a mmu context as an additional parameter. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	8df25a328a	KVM: MMU: Track page fault data in struct vcpu This patch introduces a struct with two new fields in vcpu_arch for x86: * fault.address * fault.error_code This will be used to correctly propagate page faults back into the guest when we could have either an ordinary page fault or a nested page fault. In the case of a nested page fault the fault-address is different from the original address that should be walked. So we need to keep track about the real fault-address. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:33 +02:00
Joerg Roedel	3241f22da8	KVM: MMU: Let is_rsvd_bits_set take mmu context instead of vcpu This patch changes is_rsvd_bits_set() function prototype to take only a kvm_mmu context instead of a full vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:32 +02:00
Joerg Roedel	52fde8df7d	KVM: MMU: Introduce kvm_init_shadow_mmu helper function Some logic of the init_kvm_softmmu function is required to build the Nested Nested Paging context. So factor the required logic into a seperate function and export it. Also make the whole init path suitable for more than one mmu context. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:32 +02:00
Joerg Roedel	cb659db8a7	KVM: MMU: Introduce inject_page_fault function pointer This patch introduces an inject_page_fault function pointer into struct kvm_mmu which will be used to inject a page fault. This will be used later when Nested Nested Paging is implemented. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:31 +02:00
Joerg Roedel	5777ed340d	KVM: MMU: Introduce get_cr3 function pointer This function pointer in the MMU context is required to implement Nested Nested Paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:31 +02:00
Joerg Roedel	1c97f0a04c	KVM: X86: Introduce a tdp_set_cr3 function This patch introduces a special set_tdp_cr3 function pointer in kvm_x86_ops which is only used for tpd enabled mmu contexts. This allows to remove some hacks from svm code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:30 +02:00
Joerg Roedel	f43addd461	KVM: MMU: Make set_cr3 a function pointer in kvm_mmu This is necessary to implement Nested Nested Paging. As a side effect this allows some cleanups in the SVM nested paging code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:29 +02:00
Joerg Roedel	c5a78f2b64	KVM: MMU: Make tdp_enabled a mmu-context parameter This patch changes the tdp_enabled flag from its global meaning to the mmu-context and renames it to direct_map there. This is necessary for Nested SVM with emulation of Nested Paging where we need an extra MMU context to shadow the Nested Nested Page Table. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:28 +02:00
Joerg Roedel	957446afce	KVM: MMU: Check for root_level instead of long mode The walk_addr function checks for !is_long_mode in its 64 bit version. But what is meant here is a check for pae paging. Change the condition to really check for pae paging so that it also works with nested nested paging. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:52:27 +02:00
Jes Sorensen	7b91409822	KVM: x86: Emulate MSR_EBC_FREQUENCY_ID Some operating systems store data about the host processor at the time of installation, and when booted on a more uptodate cpu tries to read MSR_EBC_FREQUENCY_ID. This has been found with XP. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:27 +02:00
Jes Sorensen	b9a52c4b78	x86: Define MSR_EBC_FREQUENCY_ID Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:26 +02:00
Roedel, Joerg	b75f4eb341	KVM: SVM: Clean up rip handling in vmrun emulation This patch changes the rip handling in the vmrun emulation path from using next_rip to the generic kvm register access functions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:25 +02:00
Joerg Roedel	cda0008299	KVM: SVM: Restore correct registers after sel_cr0 intercept emulation This patch implements restoring of the correct rip, rsp, and rax after the svm emulation in KVM injected a selective_cr0 write intercept into the guest hypervisor. The problem was that the vmexit is emulated in the instruction emulation which later commits the registers right after the write-cr0 instruction. So the l1 guest will continue to run with the l2 rip, rsp and rax resulting in unpredictable behavior. This patch is not the final word, it is just an easy patch to fix the issue. The real fix will be done when the instruction emulator is made aware of nested virtualization. Until this is done this patch fixes the issue and provides an easy way to fix this in -stable too. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:24 +02:00
Joerg Roedel	f87f928882	KVM: MMU: Fix 32 bit legacy paging with NPT This patch fixes 32 bit legacy paging with NPT enabled. The mmu_check_root call on the top-level of the loop causes root_gfn to take values (in the tdp_enabled path) which are outside of guest memory. So the mmu_check_root call fails at some point in the loop interation causing the guest to tiple-fault. This patch changes the mmu_check_root calls to the places where they are really necessary. As a side-effect it introduces a check for the root of a pae page table too. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:52:23 +02:00
Xiao Guangrong	30644b902c	KVM: MMU: lower the aduit frequency The audit is very high overhead, so we need lower the frequency to assure the guest is running. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:59 +02:00
Xiao Guangrong	eb2591865a	KVM: MMU: improve spte audit Both audit_mappings() and audit_sptes_have_rmaps() need to walk vcpu's page table, so we can do these checking in a spte walking Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:58 +02:00
Xiao Guangrong	49edf87806	KVM: MMU: improve active sp audit Both audit_rmap() and audit_write_protection() need to walk all active sp, so we can do these checking in a sp walking Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:57 +02:00
Xiao Guangrong	2f4f337248	KVM: MMU: move audit to a separate file Move the audit code from arch/x86/kvm/mmu.c to arch/x86/kvm/mmu_audit.c Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:57 +02:00
Xiao Guangrong	8b1fe17cc7	KVM: MMU: support disable/enable mmu audit dynamicly Add a r/w module parameter named 'mmu_audit', it can control audit enable/disable: enable: echo 1 > /sys/module/kvm/parameters/mmu_audit disable: echo 0 > /sys/module/kvm/parameters/mmu_audit This patch not change the logic Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:56 +02:00
Jes Sorensen	84e0cefa8d	KVM: Fix guest kernel crash on MSR_K7_CLK_CTL MSR_K7_CLK_CTL is a no longer documented MSR, which is only relevant on said old AMD CPU models. This change returns the expected value, which the Linux kernel is expecting to avoid writing back the MSR, plus it ignores all writes to the MSR. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:55 +02:00
Avi Kivity	9ed049c3b6	KVM: i8259: Make ICW1 conform to spec ICW is not a full reset, instead it resets a limited number of registers in the PIC. Change ICW1 emulation to only reset those registers. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:54 +02:00
Avi Kivity	7d9ddaedd8	KVM: x86 emulator: clean up control flow in x86_emulate_insn() x86_emulate_insn() is full of things like if (rc != X86EMUL_CONTINUE) goto done; break; consolidate all of those at the end of the switch statement. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:54 +02:00
Avi Kivity	a4d4a7c188	KVM: x86 emulator: fix group 11 decoding for reg != 0 These are all undefined. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:53 +02:00
Avi Kivity	b9eac5f4d1	KVM: x86 emulator: use single stage decoding for mov instructions Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:52 +02:00
Avi Kivity	e90aa41e6c	KVM: Don't save/restore MSR_IA32_PERF_STATUS It is read/only; restoring it only results in annoying messages. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:51 +02:00
Marcelo Tosatti	eaa48512ba	KVM: SVM: init_vmcb should reset vcpu->efer Otherwise EFER_LMA bit is retained across a SIPI reset. Fixes guest cpu onlining. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:51 +02:00
Marcelo Tosatti	678041ad9d	KVM: SVM: reset mmu context in init_vmcb Since commit `aad827034e` no mmu reinitialization is performed via init_vmcb. Zero vcpu->arch.cr0 and pass the reset value as a parameter to kvm_set_cr0. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:50 +02:00
Avi Kivity	c41a15dd46	KVM: Fix pio trace direction out = write, in = read, not the other way round. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:49 +02:00
Xiao Guangrong	8e0e8afa82	KVM: MMU: remove count_rmaps() Nothing is checked in count_rmaps(), so remove it Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:49 +02:00
Xiao Guangrong	365fb3fdf6	KVM: MMU: rewrite audit_mappings_page() function There is a bugs in this function, we call gfn_to_pfn() and kvm_mmu_gva_to_gpa_read() in atomic context(kvm_mmu_audit() is called under the spinlock(mmu_lock)'s protection). This patch fix it by: - introduce gfn_to_pfn_atomic instead of gfn_to_pfn - get the mapping gfn from kvm_mmu_page_get_gfn() And it adds 'notrap' ptes check in unsync/direct sps Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:48 +02:00
Xiao Guangrong	bc32ce2152	KVM: MMU: fix wrong not write protected sp report The audit code reports some sp not write protected in current code, it's just the bug in audit_write_protection(), since: - the invalid sp not need write protected - using uninitialize local variable('gfn') - call kvm_mmu_audit() out of mmu_lock's protection Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:47 +02:00
Xiao Guangrong	0beb8d6604	KVM: MMU: check rmap for every spte The read-only spte also has reverse mapping, so fix the code to check them, also modify the function name to fit its doing Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:46 +02:00
Xiao Guangrong	9ad17b1001	KVM: MMU: fix compile warning in audit code fix: arch/x86/kvm/mmu.c: In function ‘kvm_mmu_unprotect_page’: arch/x86/kvm/mmu.c:1741: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c:1745: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘mmu_unshadow’: arch/x86/kvm/mmu.c:1761: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘set_spte’: arch/x86/kvm/mmu.c:2005: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 3 has type ‘gfn_t’ arch/x86/kvm/mmu.c: In function ‘mmu_set_spte’: arch/x86/kvm/mmu.c:2033: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 7 has type ‘gfn_t’ Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:46 +02:00
Jason Wang	23e7a7944f	KVM: pit: Do not check pending pit timer in vcpu thread Pit interrupt injection was done by workqueue, so no need to check pending pit timer in vcpu thread which could lead unnecessary unblocking of vcpu. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:45 +02:00
Avi Kivity	6230f7fc04	KVM: x86 emulator: simplify ALU opcode block decode further The ALU opcode block is very regular; introduce D6ALU() to define decode flags for 6 instructions at a time. Suggested by Paolo Bonzini. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:43 +02:00
Avi Kivity	217fc9cfca	KVM: Fix build error due to 64-bit division in nsec_to_cycles() Use do_div() instead. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:43 +02:00
Avi Kivity	34d1f4905e	KVM: x86 emulator: trap and propagate #DE from DIV and IDIV Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:42 +02:00
Avi Kivity	f6b3597bde	KVM: x86 emulator: add macros for executing instructions that may trap Like DIV and IDIV. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:41 +02:00
Avi Kivity	739ae40606	KVM: x86 emulator: simplify instruction decode flags for opcodes 0F 00-FF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:41 +02:00
Avi Kivity	d269e3961a	KVM: x86 emulator: simplify instruction decode flags for opcodes E0-FF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:40 +02:00
Avi Kivity	d2c6c7adb1	KVM: x86 emulator: simplify instruction decode flags for opcodes C0-DF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:39 +02:00
Avi Kivity	50748613d1	KVM: x86 emulator: simplify instruction decode flags for opcodes A0-AF Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:38 +02:00
Avi Kivity	76e8e68d44	KVM: x86 emulator: simplify instruction decode flags for opcodes 80-8F Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:38 +02:00
Avi Kivity	48fe67b5f7	KVM: x86 emulator: simplify string instruction decode flags Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:37 +02:00
Avi Kivity	5315fbb223	KVM: x86 emulator: simplify ALU block (opcodes 00-3F) decode flags Use the new byte/word dual opcode decode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:36 +02:00
Avi Kivity	8d8f4e9f66	KVM: x86 emulator: support byte/word opcode pairs Many x86 instructions come in byte and word variants distinguished with bit 0 of the opcode. Add macros to aid in defining them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:35 +02:00
Avi Kivity	081bca0e6b	KVM: x86 emulator: refuse SrcMemFAddr (e.g. LDS) with register operand SrcMemFAddr is not defined with the modrm operand designating a register instead of a memory address. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:35 +02:00
Gleb Natapov	d2ddd1c483	KVM: x86 emulator: get rid of "restart" in emulation context. x86_emulate_insn() will return 1 if instruction can be restarted without re-entering a guest. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:34 +02:00
Gleb Natapov	3e2f65d57a	KVM: x86 emulator: move string instruction completion check into separate function Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:33 +02:00
Gleb Natapov	6e2fb2cadd	KVM: x86 emulator: Rename variable that shadows another local variable. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:32 +02:00
Wei Yongjun	cc4feed57f	KVM: x86 emulator: add CALL FAR instruction emulation (opcode 9a) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:31 +02:00
Xiao Guangrong	189be38db3	KVM: MMU: combine guest pte read between fetch and pte prefetch Combine guest pte read between guest pte check in the fetch path and pte prefetch Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:28 +02:00
Xiao Guangrong	957ed9effd	KVM: MMU: prefetch ptes when intercepted guest #PF Support prefetch ptes when intercept guest #PF, avoid to #PF by later access If we meet any failure in the prefetch path, we will exit it and not try other ptes to avoid become heavy path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:27 +02:00
Zachary Amsden	1d5f066e0b	KVM: x86: Fix a possible backwards warp of kvmclock Kernel time, which advances in discrete steps may progress much slower than TSC. As a result, when kvmclock is adjusted to a new base, the apparent time to the guest, which runs at a much higher, nsec scaled rate based on the current TSC, may have already been observed to have a larger value (kernel_ns + scaled tsc) than the value to which we are setting it (kernel_ns + 0). We must instead compute the clock as potentially observed by the guest for kernel_ns to make sure it does not go backwards. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	347bb4448c	x86: pvclock: Move scale_delta into common header The scale_delta function for shift / multiply with 31-bit precision moves to a common header so it can be used by both kernel and kvm module. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	ca84d1a24c	KVM: x86: Add clock sync request to hardware enable If there are active VCPUs which are marked as belonging to a particular hardware CPU, request a clock sync for them when enabling hardware; the TSC could be desynchronized on a newly arriving CPU, and we need to recompute guests system time relative to boot after a suspend event. This covers both cases. Note that it is acceptable to take the spinlock, as either no other tasks will be running and no locks held (BSP after resume), or other tasks will be guaranteed to drop the lock relatively quickly (AP on CPU_STARTING). Noting we now get clock synchronization requests for VCPUs which are starting up (or restarting), it is tempting to attempt to remove the arch/x86/kvm/x86.c CPU hot-notifiers at this time, however it is not correct to do so; they are required for systems with non-constant TSC as the frequency may not be known immediately after the processor has started until the cpufreq driver has had a chance to run and query the chipset. Updated: implement better locking semantics for hardware_enable Removed the hack of dropping and retaking the lock by adding the semantic that we always hold kvm_lock when hardware_enable is called. The one place that doesn't need to worry about it is resume, as resuming a frozen CPU, the spinlock won't be taken. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:24 +02:00
Zachary Amsden	46543ba45f	KVM: x86: Robust TSC compensation Make the match of TSC find TSC writes that are close to each other instead of perfectly identical; this allows the compensator to also work in migration / suspend scenarios. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	759379dd68	KVM: x86: Add helper functions for time computation Add a helper function to compute the kernel time and convert nanoseconds back to CPU specific cycles. Note that these must not be called in preemptible context, as that would mean the kernel could enter software suspend state, which would cause non-atomic operation. Also, convert the KVM_SET_CLOCK / KVM_GET_CLOCK ioctls to use the kernel time helper, these should be bootbased as well. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	48434c20e1	KVM: x86: Fix deep C-state TSC desynchronization When CPUs with unstable TSCs enter deep C-state, TSC may stop running. This causes us to require resynchronization. Since we can't tell when this may potentially happen, we assume the worst by forcing re-compensation for it at every point the VCPU task is descheduled. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	e48672fa25	KVM: x86: Unify TSC logic Move the TSC control logic from the vendor backends into x86.c by adding adjust_tsc_offset to x86 ops. Now all TSC decisions can be done in one place. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:23 +02:00
Zachary Amsden	6755bae8e6	KVM: x86: Warn about unstable TSC If creating an SMP guest with unstable host TSC, issue a warning Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	8cfdc00085	KVM: x86: Make cpu_tsc_khz updates use local CPU This simplifies much of the init code; we can now simply always call tsc_khz_changed, optionally passing it a new value, or letting it figure out the existing value (while interrupts are disabled, and thus, by inference from the rule, not raceful against CPU hotplug or frequency updates, which will issue IPIs to the local CPU to perform this very same task). Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	f38e098ff3	KVM: x86: TSC reset compensation Attempt to synchronize TSCs which are reset to the same value. In the case of a reliable hardware TSC, we can just re-use the same offset, but on non-reliable hardware, we can get closer by adjusting the offset to match the elapsed time. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	99e3e30aee	KVM: x86: Move TSC offset writes to common code Also, ensure that the storing of the offset and the reading of the TSC are never preempted by taking a spinlock. While the lock is overkill now, it is useful later in this patch series. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	f4e1b3c8bd	KVM: x86: Convert TSC writes to TSC offset writes Change svm / vmx to be the same internally and write TSC offset instead of bare TSC in helper functions. Isolated as a single patch to contain code movement. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:22 +02:00
Zachary Amsden	ae38436b78	KVM: x86: Drop vm_init_tsc This is used only by the VMX code, and is not done properly; if the TSC is indeed backwards, it is out of sync, and will need proper handling in the logic at each and every CPU change. For now, drop this test during init as misguided. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Wei Yongjun	45bf21a8ce	KVM: MMU: fix missing percpu counter destroy commit ad05c88266b4cce1c820928ce8a0fb7690912ba1 (KVM: create aggregate kvm_total_used_mmu_pages value) introduce percpu counter kvm_total_used_mmu_pages but never destroy it, this may cause oops when rmmod & modprobe. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Xiaotian Feng	80b63faf02	KVM: MMU: fix regression from rework mmu_shrink() code Latest kvm mmu_shrink code rework makes kernel changes kvm->arch.n_used_mmu_pages/ kvm->arch.n_max_mmu_pages at kvm_mmu_free_page/kvm_mmu_alloc_page, which is called by kvm_mmu_commit_zap_page. So the kvm->arch.n_used_mmu_pages or kvm_mmu_available_pages(vcpu->kvm) is unchanged after kvm_mmu_prepare_zap_page(), This caused kvm_mmu_change_mmu_pages/__kvm_mmu_free_some_pages loops forever. Moving kvm_mmu_commit_zap_page would make the while loop performs as normal. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Tested-by: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:21 +02:00
Wei Yongjun	e4abac67b7	KVM: x86 emulator: add JrCXZ instruction emulation Add JrCXZ instruction emulation (opcode 0xe3) Used by FreeBSD boot loader. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:20 +02:00
Wei Yongjun	09b5f4d3c4	KVM: x86 emulator: add LDS/LES/LFS/LGS/LSS instruction emulation Add LDS/LES/LFS/LGS/LSS instruction emulation. (opcode 0xc4, 0xc5, 0x0f 0xb2, 0x0f 0xb4~0xb5) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:51:20 +02:00
Dave Hansen	45221ab668	KVM: create aggregate kvm_total_used_mmu_pages value Of slab shrinkers, the VM code says: * Note that 'shrink' will be passed nr_to_scan == 0 when the VM is * querying the cache size, so a fastpath for that case is appropriate. and it means it. Look at how it calls the shrinkers: nr_before = (shrinker->shrink)(0, gfp_mask); shrink_ret = (shrinker->shrink)(this_scan, gfp_mask); So, if you do anything stupid in your shrinker, the VM will doubly punish you. The mmu_shrink() function takes the global kvm_lock, then acquires every VM's kvm->mmu_lock in sequence. If we have 100 VMs, then we're going to take 101 locks. We do it twice, so each call takes 202 locks. If we're under memory pressure, we can have each cpu trying to do this. It can get really hairy, and we've seen lock spinning in mmu_shrink() be the dominant entry in profiles. This is guaranteed to optimize at least half of those lock aquisitions away. It removes the need to take any of the locks when simply trying to count objects. A 'percpu_counter' can be a large object, but we only have one of these for the entire system. There are not any better alternatives at the moment, especially ones that handle CPU hotplug. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:19 +02:00
Dave Hansen	49d5ca2663	KVM: replace x86 kvm n_free_mmu_pages with n_used_mmu_pages Doing this makes the code much more readable. That's borne out by the fact that this patch removes code. "used" also happens to be the number that we need to return back to the slab code when our shrinker gets called. Keeping this value as opposed to free makes the next patch simpler. So, 'struct kvm' is kzalloc()'d. 'struct kvm_arch' is a structure member (and not a pointer) of 'struct kvm'. That means they start out zeroed. I _think_ they get initialized properly by kvm_mmu_change_mmu_pages(). But, that only happens via kvm ioctls. Another benefit of storing 'used' intead of 'free' is that the values are consistent from the moment the structure is allocated: no negative "used" value. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Dave Hansen	39de71ec53	KVM: rename x86 kvm->arch.n_alloc_mmu_pages arch.n_alloc_mmu_pages is a poor choice of name. This value truly means, "the number of pages which _may_ be allocated". But, reading the name, "n_alloc_mmu_pages" implies "the number of allocated mmu pages", which is dead wrong. It's really the high watermark, so let's give it a name to match: nr_max_mmu_pages. This change will make the next few patches much more obvious and easy to read. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:18 +02:00
Dave Hansen	e0df7b9f6c	KVM: abstract kvm x86 mmu->n_free_mmu_pages "free" is a poor name for this value. In this context, it means, "the number of mmu pages which this kvm instance should be able to allocate." But "free" implies much more that the objects are there and ready for use. "available" is a much better description, especially when you see how it is calculated. In this patch, we abstract its use into a function. We'll soon replace the function's contents by calculating the value in a different way. All of the reads of n_free_mmu_pages are taken care of in this patch. The modification sites will be handled in a patch later in the series. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Tim Pepper <lnxninja@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:17 +02:00
Avi Kivity	6142914280	KVM: x86 emulator: implement CWD (opcode 99) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:16 +02:00
Avi Kivity	d46164dbd9	KVM: x86 emulator: implement IMUL REG, R/M, IMM (opcode 69) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:16 +02:00
Avi Kivity	7db41eb762	KVM: x86 emulator: add Src2Imm decoding Needed for 3-operand IMUL. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:15 +02:00
Avi Kivity	39f21ee546	KVM: x86 emulator: consolidate immediate decode into a function Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:14 +02:00
Avi Kivity	48bb5d3c40	KVM: x86 emulator: implement RDTSC (opcode 0F 31) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:14 +02:00
Avi Kivity	7077aec0bc	KVM: x86 emulator: remove SrcImplicit Useless. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:13 +02:00
Avi Kivity	5c82aa2998	KVM: x86 emulator: implement IMUL REG, R/M (opcode 0F AF) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	f3a1b9f496	KVM: x86 emulator: implement IMUL REG, R/M, imm8 (opcode 6B) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	40ece7c729	KVM: x86 emulator: implement RET imm16 (opcode C2) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	b250e60589	KVM: x86 emulator: add SrcImmU16 operand type Used for RET NEAR instructions. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	0ef753b8c3	KVM: x86 emulator: implement CALL FAR (FF /3) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:12 +02:00
Avi Kivity	7af04fc05c	KVM: x86 emulator: implement DAS (opcode 2F) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	fb2c264105	KVM: x86 emulator: Use a register for ____emulate_2op() destination Most x86 two operand instructions allow the destination to be a memory operand, but IMUL (for example) requires that the destination be a register. Change ____emulate_2op() to take a register for both source and destination so we can invoke IMUL. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	b3b3d25a12	KVM: x86 emulator: pass destination type to ____emulate_2op() We'll need it later so we can use a register for the destination. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Wei Yongjun	f2f3184534	KVM: x86 emulator: add LOOP/LOOPcc instruction emulation Add LOOP/LOOPcc instruction emulation (opcode 0xe0~0xe2). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Wei Yongjun	e8b6fa70e3	KVM: x86 emulator: add CBW/CWDE/CDQE instruction emulation Add CBW/CWDE/CDQE instruction emulation.(opcode 0x98) Used by FreeBSD's boot loader. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:11 +02:00
Avi Kivity	0fa6ccbd28	KVM: x86 emulator: fix REPZ/REPNZ termination condition EFLAGS.ZF needs to be checked after each iteration, not before. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:10 +02:00
Avi Kivity	f6b33fc504	KVM: x86 emulator: implement SCAS (opcodes AE, AF) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:10 +02:00
Avi Kivity	5c56e1cf7a	KVM: x86 emulator: fix INTn emulation not pushing EFLAGS and CS emulate_push() only schedules a push; it doesn't actually push anything. Call writeback() to flush out the write. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	a13a63faa6	KVM: x86 emulator: remove dup code of in/out instruction Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	41167be544	KVM: x86 emulator: change OUT instruction to use dst instead of src Change OUT instruction to use dst instead of src, so we can reuse those code for all out instructions. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	943858e275	KVM: x86 emulator: introduce DstImmUByte for dst operand decode Introduce DstImmUByte for dst operand decode, which will be used for out instruction. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	c483c02ad3	KVM: x86 emulator: remove useless label from x86_emulate_insn() Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:09 +02:00
Wei Yongjun	ee45b58efe	KVM: x86 emulator: add setcc instruction emulation Add setcc instruction emulation (opcode 0x0f 0x90~0x9f) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:08 +02:00
Wei Yongjun	92f738a52b	KVM: x86 emulator: add XADD instruction emulation Add XADD instruction emulation (opcode 0x0f 0xc0~0xc1) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:06 +02:00
Wei Yongjun	31be40b398	KVM: x86 emulator: put register operand write back to a function Introduce function write_register_operand() to write back the register operand. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:06 +02:00
Mohammed Gamal	8ec4722dd2	KVM: Separate emulation context initialization in a separate function The code for initializing the emulation context is duplicated at two locations (emulate_instruction() and kvm_task_switch()). Separate it in a separate function and call it from there. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Wei Yongjun	d9574a25af	KVM: x86 emulator: add bsf/bsr instruction emulation Add bsf/bsr instruction emulation (opcode 0x0f 0xbc~0xbd) Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Mohammed Gamal	8c5eee30a9	KVM: x86 emulator: Fix emulate_grp3 return values This patch lets emulate_grp3() return X86EMUL_* return codes instead of hardcoded ones. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Mohammed Gamal	3f9f53b0d5	KVM: x86 emulator: Add unary mul, imul, div, and idiv instructions This adds unary mul, imul, div, and idiv instructions (group 3 r/m 4-7). Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:04 +02:00
Wei Yongjun	ba7ff2b76d	KVM: x86 emulator: mask group 8 instruction as BitOp Mask group 8 instruction as BitOp, so we can share the code for adjust the source operand. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:03 +02:00
Wei Yongjun	3885f18fe3	KVM: x86 emulator: do not adjust the address for immediate source adjust the dst address for a register source but not adjust the address for an immediate source. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:02 +02:00
Wei Yongjun	35c843c485	KVM: x86 emulator: fix negative bit offset BitOp instruction emulation If bit offset operands is a negative number, BitOp instruction will return wrong value. This patch fix it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:01 +02:00
Mohammed Gamal	8744aa9aad	KVM: x86 emulator: Add stc instruction (opcode 0xf9) Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:01 +02:00
Wei Yongjun	c034da8b92	KVM: x86 emulator: using SrcOne for instruction d0/d1 decoding Using SrcOne for instruction d0/d1 decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Wei Yongjun	36089fed70	KVM: x86 emulator: disable writeback when decode dest operand This patch change to disable writeback when decode dest operand if the dest type is ImplicitOps or not specified. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Wei Yongjun	06cb704611	KVM: x86 emulator: use SrcAcc to simplify stos decoding Use SrcAcc to simplify stos decoding. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Mohammed Gamal	6e154e56b4	KVM: x86 emulator: Add into, int, and int3 instructions (opcodes 0xcc-0xce) This adds support for int instructions to the emulator. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:51:00 +02:00
Mohammed Gamal	160ce1f1a8	KVM: x86 emulator: Allow accessing IDT via emulator ops The patch adds a new member get_idt() to x86_emulate_ops. It also adds a function to get the idt in order to be used by the emulator. This is needed for real mode interrupt injection and the emulation of int instructions. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:59 +02:00
Wei Yongjun	d3ad624329	KVM: x86 emulator: simplify two-byte opcode check Two-byte opcode always start with 0x0F and the decode flags of opcode 0xF0 is always 0, so remove dup check. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:59 +02:00
Alexander Graf	ba49296236	KVM: Move kvm_guest_init out of generic code Currently x86 is the only architecture that uses kvm_guest_init(). With PowerPC we're getting a second user, but the signature is different there and we don't need to export it, as it uses the normal kernel init framework. So let's move the x86 specific definition of that function over to the x86 specfic header file. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:49 +02:00
Mohammed Gamal	34698d8c61	KVM: x86 emulator: Fix nop emulation If a nop instruction is encountered, we jump directly to the done label. This skip updating rip. Break from the switch case instead Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:41 +02:00
Avi Kivity	2dbd0dd711	KVM: x86 emulator: Decode memory operands directly into a 'struct operand' Since modrm operand can be either register or memory, decoding it into a 'struct operand', which can represent both, is simpler. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:40 +02:00
Avi Kivity	1f6f05800e	KVM: x86 emulator: change invlpg emulation to use src.mem.addr Instead of using modrm_ea, which will soon be gone. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:39 +02:00
Avi Kivity	342fc63095	KVM: x86 emulator: switch LEA to use SrcMem decoding The NoAccess flag will prevent memory from being accessed. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:38 +02:00
Avi Kivity	5a506b125f	KVM: x86 emulator: add NoAccess flag for memory instructions that skip access Use for INVLPG, which accesses the tlb, not memory. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:37 +02:00
Avi Kivity	b27f38563d	KVM: x86 emulator: use struct operand for mov reg,dr and mov dr,reg for reg op This is an ordinary modrm source or destination; use the standard structure representing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:36 +02:00
Avi Kivity	1a0c7d44e4	KVM: x86 emulator: use struct operand for mov reg,cr and mov cr,reg for reg op This is an ordinary modrm source or destination; use the standard structure representing it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	cecc9e3916	KVM: x86 emulator: mark mov cr and mov dr as 64-bit instructions in long mode Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	7f9b4b75be	KVM: x86 emulator: introduce Op3264 for mov cr and mov dr instructions The operands for these instructions are 32 bits or 64 bits, depending on long mode, and ignoring REX prefixes, or the operand size prefix. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:35 +02:00
Avi Kivity	1e87e3efe7	KVM: x86 emulator: simplify REX.W check (x && (x & y)) == (x & y) Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:34 +02:00
Avi Kivity	d4709c78ee	KVM: x86 emulator: drop use_modrm_ea Unused (and has never been). Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:34 +02:00
Avi Kivity	91ff3cb43c	KVM: x86 emulator: put register operand fetch into a function The code is repeated three times, put it into fetch_register_operand() Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	3d9e77dff8	KVM: x86 emulator: use SrcAcc to simplify xchg decoding Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	4515453964	KVM: x86 emulator: simplify xchg decode tables Use X8() to avoid repetition. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	1a6440aef6	KVM: x86 emulator: use correct type for memory address in operands Currently we use a void pointer for memory addresses. That's wrong since these are guest virtual addresses which are not directly dereferencable by the host. Use the correct type, unsigned long. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Avi Kivity	09ee57cdae	KVM: x86 emulator: push segment override out of decode_modrm() Let it compute modrm_seg instead, and have the caller apply it. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:33 +02:00
Joerg Roedel	dbe7758482	KVM: SVM: Check for asid != 0 on nested vmrun This patch lets a nested vmrun fail if the L1 hypervisor left the asid zero. This fixes the asid_zero unit test. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Joerg Roedel	52c65a30a5	KVM: SVM: Check for nested vmrun intercept before emulating vmrun This patch lets the nested vmrun fail if the L1 hypervisor has not intercepted vmrun. This fixes the "vmrun intercept check" unit test. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Xiao Guangrong	4132779b17	KVM: MMU: mark page dirty only when page is really written Mark page dirty only when this page is really written, it's more exacter, and also can fix dirty page marking in speculation path Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:32 +02:00
Xiao Guangrong	8672b7217a	KVM: MMU: move bits lost judgement into a separate function Introduce spte_has_volatile_bits() function to judge whether spte bits will miss, it's more readable and can help us to cleanup code later Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:31 +02:00
Xiao Guangrong	251464c464	KVM: MMU: using kvm_set_pfn_accessed() instead of mark_page_accessed() It's a small cleanup that using using kvm_set_pfn_accessed() instead of mark_page_accessed() Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:30 +02:00
Gleb Natapov	4fc40f076f	KVM: x86 emulator: check io permissions only once for string pio Do not recheck io permission on every iteration. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:29 +02:00
Avi Kivity	9928ff608b	KVM: x86 emulator: fix LMSW able to clear cr0.pe LMSW is documented not to be able to clear cr0.pe; make it so. Signed-off-by: Avi Kivity <avi@redhat.com>	2010-10-24 10:50:28 +02:00
Gleb Natapov	e85d28f8e8	KVM: x86 emulator: don't update vcpu state if instruction is restarted No need to update vcpu state since instruction is in the middle of the emulation. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:27 +02:00
Avi Kivity	63540382cc	KVM: x86 emulator: convert some push instructions to direct decode Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:26 +02:00
Avi Kivity	d0e533255d	KVM: x86 emulator: allow repeat macro arguments to contain commas Needed for repeating instructions with execution functions. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:25 +02:00
Avi Kivity	73fba5f4fe	KVM: x86 emulator: move decode tables downwards So they can reference execution functions. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:25 +02:00
Avi Kivity	dde7e6d12a	KVM: x86 emulator: move x86_decode_insn() downwards No code changes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:24 +02:00
Avi Kivity	ef65c88912	KVM: x86 emulator: allow storing emulator execution function in decode tables Instead of looking up the opcode twice (once for decode flags, once for the big execution switch) look up both flags and function in the decode tables. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:22 +02:00
Avi Kivity	9aabc88fc8	KVM: x86 emulator: store x86_emulate_ops in emulation context It doesn't ever change, so we don't need to pass it around everywhere. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:21 +02:00
Avi Kivity	ab85b12b1a	KVM: x86 emulator: move ByteOp and Dst back to bits 0:3 Now that the group index no longer exists, the space is free. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:20 +02:00
Avi Kivity	3885d530b0	KVM: x86 emulator: drop support for old-style groups Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:19 +02:00
Avi Kivity	9f5d3220e3	KVM: x86 emulator: convert group 9 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:18 +02:00
Avi Kivity	2cb20bc8af	KVM: x86 emulator: convert group 8 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:18 +02:00
Avi Kivity	2f3a9bc9eb	KVM: x86 emulator: convert group 7 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:16 +02:00
Avi Kivity	b67f9f0741	KVM: x86 emulator: convert group 5 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:15 +02:00
Avi Kivity	591c9d20a3	KVM: x86 emulator: convert group 4 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:14 +02:00
Avi Kivity	ee70ea30ee	KVM: x86 emulator: convert group 3 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:13 +02:00
Avi Kivity	99880c5cd5	KVM: x86 emulator: convert group 1A to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:12 +02:00
Avi Kivity	5b92b5faff	KVM: x86 emulator: convert group 1 to new style Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:11 +02:00
Avi Kivity	120df8902d	KVM: x86 emulator: allow specifying group directly in opcode Instead of having a group number, store the group table pointer directly in the opcode. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:10 +02:00
Avi Kivity	793d5a8d6b	KVM: x86 emulator: reserve group code 0 We'll be using that to distinguish between new-style and old-style groups. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:09 +02:00
Avi Kivity	42a1c52095	KVM: x86 emulator: move group tables to top No code changes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:08 +02:00
Avi Kivity	fd853310a1	KVM: x86 emulator: Add wrappers for easily defining opcodes Once 'struct opcode' grows, its initializer will become more complicated. Wrap the simple initializers in a D() macro, and replace the empty initializers with an even simpler N macro. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:08 +02:00
Avi Kivity	d65b1dee40	KVM: x86 emulator: introduce 'struct opcode' This will hold all the information known about the opcode. Currently, this is just the decode flags. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:07 +02:00
Avi Kivity	ea9ef04e19	KVM: x86 emulator: drop parentheses in repreat macros The parenthese make is impossible to use the macros with initializers that require braces. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:06 +02:00
Mohammed Gamal	62bd430e6d	KVM: x86 emulator: Add IRET instruction Ths patch adds IRET instruction (opcode 0xcf). Currently, only IRET in real mode is emulated. Protected mode support is to be added later if needed. Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Reviewed-by: Avi Kivity <avi@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:05 +02:00
Joerg Roedel	7a190667bb	KVM: SVM: Emulate next_rip svm feature This patch implements the emulations of the svm next_rip feature in the nested svm implementation in kvm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:04 +02:00
Joerg Roedel	3f6a9d1693	KVM: SVM: Sync efer back into nested vmcb This patch fixes a bug in a nested hypervisor that heavily switches between real-mode and long-mode. The problem is fixed by syncing back efer into the guest vmcb on emulated vmexit. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:03 +02:00
Xiao Guangrong	19ada5c4b6	KVM: MMU: remove valueless output message After commit 53383eaad08d, the '*spte' has updated before call rmap_remove()(in most case it's 'shadow_trap_nonpresent_pte'), so remove this information from error message Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:02 +02:00
Avi Kivity	d359192fea	KVM: VMX: Use host_gdt variable wherever we need the host gdt Now that we have the host gdt conveniently stored in a variable, make use of it instead of querying the cpu. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:01 +02:00
Avi Kivity	e071edd5ba	KVM: x86 emulator: unify the two Group 3 variants Use just one group table for byte (F6) and word (F7) opcodes. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:50:00 +02:00
Avi Kivity	dfe11481d8	KVM: x86 emulator: Allow LOCK prefix for NEG and NOT Opcodes F6/2, F6/3, F7/2, F7/3. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:59 +02:00
Avi Kivity	4968ec4e26	KVM: x86 emulator: simplify Group 1 decoding Move operand decoding to the opcode table, keep lock decoding in the group table. This allows us to get consolidate the four variants of Group 1 into one group. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:58 +02:00
Avi Kivity	52811d7de5	KVM: x86 emulator: mix decode bits from opcode and group decode tables Allow bits that are common to all members of a group to be specified in the opcode table instead of the group table. This allows some simplification of the decode tables. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:58 +02:00
Avi Kivity	047a481809	KVM: x86 emulator: add Undefined decode flag Add a decode flag to indicate the instruction is invalid. Will come in useful later, when we mix decode bits from the opcode and group table. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:57 +02:00
Avi Kivity	2ce495365f	KVM: x86 emulator: Make group storage bits separate from operand bits Currently group bits are stored in bits 0:7, where operand bits are stored. Make group bits be 0:3, and move the existing bits 0:3 to 16:19, so we can mix group and operand bits. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:55 +02:00
Avi Kivity	880a188378	KVM: x86 emulator: consolidate Jcc rel32 decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:55 +02:00
Avi Kivity	be8eacddbd	KVM: x86 emulator: consolidate CMOVcc decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:53 +02:00
Avi Kivity	b6e6153885	KVM: x86 emulator: consolidate MOV reg, imm decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:52 +02:00
Avi Kivity	b3ab3405fe	KVM: x86 emulator: consolidate Jcc rel8 decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:51 +02:00
Avi Kivity	3849186c38	KVM: x86 emulator: consolidate push/pop reg decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:49 +02:00
Avi Kivity	749358a6b4	KVM: x86 emulator: consolidate inc/dec reg decoding Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:48 +02:00
Avi Kivity	83babbca46	KVM: x86 emulator: add macros for repetitive instructions Some instructions are repetitive in the opcode space, add macros for consolidating them. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:48 +02:00
Avi Kivity	91269b8f94	KVM: x86 emulator: fix handling for unemulated instructions If an instruction is present in the decode tables but not in the execution switch, it will be emulated as a NOP. An example is IRET (0xcf). Fix by adding default: labels to the execution switches. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-24 10:49:47 +02:00
Jiri Slaby	e4072a9a9d	x86, printk: Get rid of <0> from stack output The stack output currently looks like this: 7fffffffffffffff 0000000a00000000 ffffffff81093341 0000000000000046 <0> ffff88003a545fd8 0000000000000000 0000000000000000 00007fffa39769c0 <0> ffff88003e403f58 ffffffff8102fc4c ffff88003e403f58 ffff88003e403f78 The superfluous <0> are caused by recent printk KERN_CONT change. <*> is now ignored in printk unless some text follows the level and even then it still has to be the first in the format message. Note that the log_lvl parameter is now completely ignored in show_stack_log_lvl and the stack is dumped with the default level (like for quite some time already). It behaves the same as the rest of the dump, function traces are dumped in the very same manner. Only Code and maybe some lines are printed with EMERG level. Unfortunately I see no way how to fix this conceptually to have the whole oops/BUG/panic output with the same level, so this removed only the superfluous characters for the time being. Just for illustration: <4>Process kworker/0:0 (pid: 0, threadinfo ffff88003c8a6000, task ffff88003c85c100) <0>Stack: <4> ffffffff818022c0 0000000a00000001 0000000000000001 0000000000000046 <4> ffff88003c8a7fd8 0000000000000001 ffff88003c8a7e58 0000000000000000 <4> ffff88003e503f48 ffffffff8102fc4c ffff88003e503f48 ffff88003e503f68 <0>Call Trace: <0> <IRQ> <4> [<ffffffff8102fc4c>] ? call_softirq+0x1c/0x30 ... <0>Code: 00 01 00 00 65 8b 04 25 80 c5 00 00 c7 45 ... Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: jirislaby@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1287586131-16222-1-git-send-email-jslaby@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-23 20:03:03 +02:00
Thomas Gleixner	23f9b26715	x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings() probe_br_irqs_gsi() is called right after ioapic_init_mappings() and there are no other users. Move it into ioapic_init_mappings() so the declaration can disappear and the function can become static. Rename ioapic_init_mappings() to ioapic_and_gsi_init() to reflect that change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1287510389-8388-2-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>	2010-10-23 17:27:50 +02:00
Thomas Gleixner	5a7ae78fd4	x86: Allow platforms to force enable apic Some embedded x86 platforms don't setup the APIC in the BIOS/bootloader and would be forced to add "lapic" on the kernel command line. That's a bit akward. Split out the force enable code from detect_init_APIC() and allow platform code to call it from the platform setup. That avoids the command line parameter and possible replication of the MSR dance in the force enable code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <1287510389-8388-1-git-send-email-dirk.brandewie@gmail.com> Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>	2010-10-23 17:27:43 +02:00
Linus Torvalds	02f36038c5	Merge branches 'softirq-for-linus', 'x86-debug-for-linus', 'x86-numa-for-linus', 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'softirq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: softirqs: Make wakeup_softirqd static * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, asm: Restore parentheses around one pushl_cfi argument x86, asm: Fix ancient-GAS workaround x86, asm: Fix CFI macro invocations to deal with shortcomings in gas * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA * 'x86-quirks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: HPET force enable for CX700 / VIA Epia LT * 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, setup: Use string copy operation to optimze copy in kernel compression * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Use allocated buffer in tlb_uv.c:tunables_read() * 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers.	2010-10-23 08:25:36 -07:00
Linus Torvalds	10f2a2b0f6	Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, mm: Add an initial page table for core bootstrapping	2010-10-22 20:37:50 -07:00
Linus Torvalds	8814011679	Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kdb,debug_core: adjust master cpu switch logic against new debug_core locking debug_core: refactor locking for master/slave cpus x86,kgdb: remove unnecessary call to kgdb_correct_hw_break() debug_core: disable hw_breakpoints on all cores in kgdb_cpu_enter() kdb,kgdb: fix sparse fixups kdb: Fix oops in kdb_unregister kdb,ftdump: Remove reference to internal kdb include kdb: Allow kernel loadable modules to add kdb shell functions debug_core: stop rcu warnings on kernel resume debug_core: move all watch dog syncs to a single function x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35	2010-10-22 20:35:12 -07:00
Linus Torvalds	0fc0531e0a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: update comments to reflect that percpu allocations are always zero-filled percpu: Optimize __get_cpu_var() x86, percpu: Optimize this_cpu_ptr percpu: clear memory allocated with the km allocator percpu: fix build breakage on s390 and cleanup build configuration tests percpu: use percpu allocator on UP too percpu: reduce PCPU_MIN_UNIT_SIZE to 32k vmalloc: pcpu_get/free_vm_areas() aren't needed on UP Fixed up trivial conflicts in include/linux/percpu.h	2010-10-22 17:31:36 -07:00
H. Peter Anvin	fd35fbcdd1	x86-64, asm: Use fxsaveq/fxrestorq in more places Checkin `d7acb92fea` made use of fxsaveq in fpu_fxsave() if the assembler supports it; this adds fxsaveq/fxrstorq to fxrstor_checking() and fxsave_user() as well. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <AANLkTi=RKyHLNTq6iomZOXkc6Zw1j9iAgsq8388XmzwN@mail.gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-22 15:33:38 -07:00
Dongdong Deng	39a0715f5a	x86,kgdb: remove unnecessary call to kgdb_correct_hw_break() The kernel debug_core invokes hw breakpoint install and removal via call backs. The architecture specific kgdb stubs only need to implement the call backs and not actually call the functions. Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> CC: x86@kernel.org CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: H. Peter Anvin <hpa@zytor.com>	2010-10-22 15:34:13 -05:00
Jason Wessel	91b152aa85	kdb,kgdb: fix sparse fixups Fix the following sparse warnings: kdb_main.c:328:5: warning: symbol 'kdbgetu64arg' was not declared. Should it be static? kgdboc.c:246:12: warning: symbol 'kgdboc_early_init' was not declared. Should it be static? kgdb.c:652:26: warning: incorrect type in argument 1 (different address spaces) kgdb.c:652:26: expected void const ptr kgdb.c:652:26: got struct perf_event [noderef] <asn:3>pev The one in kgdb.c required the (void __force) because of the return code from register_wide_hw_breakpoint looking like: return (void __percpu __force *)ERR_PTR(err); Signed-off-by: Jason Wessel <jason.wessel@windriver.com>	2010-10-22 15:34:12 -05:00
Jason Wessel	fad99fac26	x86,kgdb: fix debugger hw breakpoint test regression in 2.6.35 HW breakpoints events stopped working correctly with kgdb as a result of commit: `018cbffe68` (Merge commit 'v2.6.33' into perf/core), later commit: `ba773f7c51` (x86,kgdb: Fix hw breakpoint regression) allowed breakpoints to propagate to the debugger core but did not completely address the original regression in functionality found in 2.6.35. When the DR_STEP flag is set in dr6 along with any of the DR_TRAP bits, the kgdb exception handler will enter once from the hw_breakpoint API call back and again from the die notifier for do_debug(), which causes the debugger to stop twice and also for the kgdb regression tests to fail running under kvm with: echo V2I1 > /sys/module/kgdbts/parameters/kgdbts To address the problem, the kgdb overflow handler needs to implement the same logic as the ptrace overflow handler call back with respect to updating the virtual copy of dr6. This will allow the kgdb do_debug() die notifier to properly handle the exception and the attached debugger, or kgdb test suite, will only receive a single notification. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: x86@kernel.org	2010-10-22 15:34:10 -05:00
Stefano Stabellini	0e058e5277	xen: add a missing #include to arch/x86/pci/xen.c Add missing #include <asm/io_apic.h> to arch/x86/pci/xen.c. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>	2010-10-22 21:26:02 +01:00
Stefano Stabellini	ff12849a7a	xen: mask the MTRR feature from the cpuid We don't want Linux to think that the cpu supports MTRRs when running under Xen because MTRR operations could only be performed through hypercalls. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:26:02 +01:00
Juan Quintela	4ec5387cc3	xen: add the direct mapping area for ISA bus access add the direct mapping area for ISA bus access when running as initial domain Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:47 +01:00
Stefano Stabellini	801fd14a72	xen: use vcpu_ops to setup cpu masks Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:45 +01:00
Jeremy Fitzhardinge	98511f3532	xen: map a dummy page for local apic and ioapic in xen_set_fixmap Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:44 +01:00
Qing He	f731e3ef02	xen: remap MSIs into pirqs when running as initial domain Implement xen_create_msi_irq to create an msi and remap it as pirq. Use xen_create_msi_irq to implement an initial domain specific version of setup_msi_irqs. Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Yunhong Jiang <yunhong.jiang@intel.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:44 +01:00
Jeremy Fitzhardinge	38aa66fcb7	xen: remap GSIs as pirqs when running as initial domain Implement xen_register_gsi to setup the correct triggering and polarity properties of a gsi. Implement xen_register_pirq to register a particular gsi as pirq and receive interrupts as events. Call xen_setup_pirqs to register all the legacy ISA irqs as pirqs. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:43 +01:00
Stefano Stabellini	6b0661a5e6	xen: introduce XEN_DOM0 as a silent option Add XEN_DOM0 to arch/x86/xen/Kconfig as a silent compile time option that gets enabled when xen and basic x86, acpi and pci support are selected. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:43 +01:00
Stefano Stabellini	809f9267bb	xen: map MSIs into pirqs Map MSIs into pirqs, writing 0 in the MSI vector data field and the pirq number in the MSI destination id field. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:43 +01:00
Stefano Stabellini	3942b740e5	xen: support GSI -> pirq remapping in PV on HVM guests Disable pcifront when running on HVM: it is meant to be used with pv guests that don't have PCI bus. Use acpi_register_gsi_xen_hvm to remap GSIs into pirqs. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge	90f6881e64	xen: add xen hvm acpi_register_gsi variant Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-10-22 21:25:42 +01:00
Jeremy Fitzhardinge	2f065aef17	acpi: use indirect call to register gsi in different modes Rather than using a tree of conditionals, use function pointer for acpi_register_gsi. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl>	2010-10-22 21:25:41 +01:00
Stefano Stabellini	42a1de56f3	xen: implement xen_hvm_register_pirq xen_hvm_register_pirq allows the kernel to map a GSI into a Xen pirq and receive the interrupt as an event channel from that point on. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 21:25:41 +01:00
Stefano Stabellini	67ba37293e	Merge commit 'konrad/stable/xen-pcifront-0.8.2' into 2.6.36-rc8-initial-domain-v6	2010-10-22 21:24:06 +01:00
Ian Campbell	9e9a5fcb04	xen: use host E820 map for dom0 When running as initial domain, get the real physical memory map from xen using the XENMEM_machine_memory_map hypercall and use it to setup the e820 regions. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-22 13:19:19 -07:00
Ian Campbell	375b2a9ada	xen: correctly rebuild mfn list list after migration. Otherwise the second migration attempt fails because the mfn_list_list still refers to all the old mfns. We need to update the entires in both p2m_top_mfn and the mid_mfn pages which p2m_top_mfn refers to. In order to do this we need to keep track of the virtual addresses mapping the p2m_mid_mfn pages since we cannot rely on mfn_to_virt(p2m_top_mfn[idx]) since p2m_top_mfn[idx] will still contain the old MFN after a migration, which may now belong to another domain and hence have a different mapping in the m2p. Therefore add and maintain a third top level page, p2m_top_mfn_p[], which tracks the virtual addresses of the mfns contained in p2m_top_mfn[]. We also need to update the content of the p2m_mid_missing_mfn page on resume to refer to the page's new mfn. p2m_missing does not need updating since the migration process takes care of the leaf p2m pages for us. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:36 -07:00
Jeremy Fitzhardinge	3654581e47	xen: don't add extra_pages for RAM after mem_end If an E820 region is entirely beyond mem_end, don't attempt to truncate it and add the truncated pages to extra_pages, as they will be negative. Also, make sure the extra memory region starts after all BIOS provided E820 regions (and in the case of RAM regions, post-clipping). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:32 -07:00
Jeremy Fitzhardinge	41f2e4771a	xen: add support for PAT Convert Linux PAT entries into Xen ones when constructing ptes. Linux doesn't use _PAGE_PAT for ptes, so the only difference in the first 4 entries is that Linux uses _PAGE_PWT for WC, whereas Xen (and default) use it for WT. xen_pte_val does the inverse conversion. We hard-code assumptions about Linux's current PAT layout, but a warning on the wrmsr to MSR_IA32_CR_PAT should point out any problems. If necessary we could go to a more general table-based conversion between Linux and Xen PAT entries. hugetlbfs poses a problem at the moment, the x86 architecture uses the same flag for _PAGE_PAT and _PAGE_PSE, which changes meaning depending on which pagetable level we're using. At the moment this should be OK so long as nobody tries to do a pte_val on a hugetlbfs pte. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:31 -07:00
Jeremy Fitzhardinge	2f7acb2085	xen: make sure xen_max_p2m_pfn is up to date Keep xen_max_p2m_pfn up to date with the end of the extra memory we're adding. It is possible that it will be too high since memory may be truncated by a "mem=" option on the kernel command line, but that won't matter. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:30 -07:00
Jeremy Fitzhardinge	698bb8d14a	xen: limit extra memory to a certain ratio of base If extra memory is very much larger than the base memory size then all of the base memory can be filled with structures reserved to describe the extra memory, leaving no space for anything else. Even at the maximum ratio there will be little space for anything else, but this change is intended to at least allow the system to boot rather than crash mysteriously. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge	b5b43ced7a	xen: add extra pages for E820 RAM regions, even if beyond mem_end If an entire E820 RAM region is beyond mem_end, still add its pages to the extra area so that space can be used by the kernel. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:29 -07:00
Jeremy Fitzhardinge	36bc251b87	xen: make sure xen_extra_mem_start is beyond all non-RAM e820 If Xen gives us non-RAM E820 entries (dom0 only, typically), then make sure the extra RAM region is beyond them. It's OK for the extra space to grow into E820 regions, however. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:28 -07:00
Jeremy Fitzhardinge	42ee1471e9	xen: implement "extra" memory to reserve space for pages not present at boot When using the e820 map to get the initial pseudo-physical address space, look for either Xen-provided memory which doesn't lie within an E820 region, or an E820 RAM region which extends beyond the Xen-provided memory range. Count these pages, and add them to a new "extra memory" range. This range has an E820 RAM range to describe it - so the kernel will allocate page structures for it - but it is also marked reserved so that the kernel will not attempt to use it. The balloon driver can then add this range as a set of currently ballooned-out pages, which can be used to extend the domain beyond its original size. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:27 -07:00
Ian Campbell	35ae11fd14	xen: Use host-provided E820 map Rather than simply using a flat memory map from Xen, use its provided E820 map. This allows the domain builder to tell the domain to reserve space for more pages than those initially provided at domain-build time. It also allows the host to specify holes in the address space (for PCI-passthrough, for example). Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:27 -07:00
Jeremy Fitzhardinge	cfd8951e08	xen: don't map missing memory When setting up a pte for a missing pfn (no matching mfn), just create an empty pte rather than a junk mapping. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:26 -07:00
Jeremy Fitzhardinge	33a847502b	xen: defer building p2m mfn structures until kernel is mapped When building mfn parts of p2m structure, we rely on being able to use mfn_to_virt, which in turn requires kernel to be mapped into the linear area (which is distinct from the kernel image mapping on 64-bit). Defer calling xen_build_mfn_list_list() until after xen_setup_kernel_pagetable(); Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge	c3798062f1	xen: add return value to set_phys_to_machine() set_phys_to_machine() can return false on failure, which means a memory allocation failure for the p2m structure. It can only fail if setting the mfn for a pfn in previously unused address space. It is guaranteed to succeed if you're setting a mapping to INVALID_P2M_ENTRY or updating the mfn for an existing pfn. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:25 -07:00
Jeremy Fitzhardinge	58e05027b5	xen: convert p2m to a 3 level tree Make the p2m structure a 3 level tree which covers the full possible physical space. The p2m structure contains mappings from the domain's pfns to system-wide mfns. The structure has 3 levels and two roots. The first root is for the domain's own use, and is linked with virtual addresses. The second is all mfn references, and is used by Xen on save/restore to allow it to update the p2m mapping for the domain. At boot, the domain builder provides a simple flat p2m array for all the initially present pages. We construct the two levels above that using the early_brk allocator. After early boot time, set_phys_to_machine() will allocate any missing levels using the normal kernel allocator (at GFP_KERNEL, so it must be called in a normal blocking context). Because the early_brk() API requires us to pre-reserve the maximum amount of memory we could allocate, there is still a CONFIG_XEN_MAX_DOMAIN_MEMORY config option, but its only negative side-effect is to increase the kernel's apparent bss size. However, since all unused brk memory is returned to the heap, there's no real downside to making it large. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:24 -07:00
Jeremy Fitzhardinge	bbbf61eff9	xen: make install_p2mtop_page() static Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge	1f2d9dd309	xen: set the actual extent of the mfn_list_list Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:23 -07:00
Jeremy Fitzhardinge	b7eb4ad391	xen: set shared_info->arch.max_pfn to max_p2m_pfn Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:22 -07:00
Jeremy Fitzhardinge	1e17fc7eff	xen: remove noise about registering vcpu info Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:21 -07:00
Jeremy Fitzhardinge	764f0138b9	xen: allocate level1_ident_pgt Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:20 -07:00
Jeremy Fitzhardinge	f0991802bb	xen: use early_brk for level2_kernel_pgt Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge	a2e8752987	xen: allocate p2m size based on actual max size Allocate p2m tables based on the actual runtime maximum pfn rather than the static config-time limit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:19 -07:00
Jeremy Fitzhardinge	a171ce6e7b	xen: dynamically allocate p2m space Use early brk mechanism to allocate p2m tables, to save memory when booting non-Xen. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:18 -07:00
Jeremy Fitzhardinge	5e941c0939	x86: add RESERVE_BRK_ARRAY() helper Useful when converting static arrays into boottime brk allocated objects. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-22 12:57:17 -07:00
Linus Torvalds	db08bf0877	Merge git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic * git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic/io.h: allow people to override individual funcs bitops: remove duplicated extern declarations bitops: make asm-generic/bitops/find.h more generic asm-generic: kdebug.h: Checkpatch cleanup asm-generic: fcntl: make exported headers use strict posix types asm-generic: cmpxchg does not handle non-long arguments asm-generic: make atomic_add_unless a function	2010-10-22 11:17:06 -07:00
Linus Torvalds	092e0e7e52	Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: vfs: make no_llseek the default vfs: don't use BKL in default_llseek llseek: automatically add .llseek fop libfs: use generic_file_llseek for simple_attr mac80211: disallow seeks in minstrel debug code lirc: make chardev nonseekable viotape: use noop_llseek raw: use explicit llseek file operations ibmasmfs: use generic_file_llseek spufs: use llseek in all file operations arm/omap: use generic_file_llseek in iommu_debug lkdtm: use generic_file_llseek in debugfs net/wireless: use generic_file_llseek in debugfs drm: use noop_llseek	2010-10-22 10:52:56 -07:00
Linus Torvalds	91151240ed	Merge branch 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-irq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, 32-bit: Align percpu area and irq stacks to THREAD_SIZE x86: Move alloc_desk_mask variables inside ifdef x86-32: Align IRQ stacks properly x86: Remove CONFIG_4KSTACKS x86: Always use irq stacks Fixed up trivial conflicts in include/linux/{irq.h, percpu-defs.h}	2010-10-22 08:54:21 -07:00
Linus Torvalds	211baf4ffc	Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Hpet: Avoid the comparator readback penalty	2010-10-22 08:47:45 -07:00
Rakib Mullick	a69a0612c4	[CPUFREQ]: x86, cpufreq: Mark longrun_get_policy with __cpuinit. This patch fixes the following warning. The function longrun_cpu_init() is marked with __cpuinit which calls longrun_get_policy() which is a __init function. So make longrun_get_policy with __cpuinit. WARNING: arch/x86/kernel/cpu/cpufreq/longrun.o(.cpuinit.text+0x4c5): Section mismatch in reference from the function longrun_cpu_init() to the function .init.text:longrun_get_policy() The function __cpuinit longrun_cpu_init() references a function __init longrun_get_policy(). If longrun_get_policy is only used by longrun_cpu_init then annotate longrun_get_policy with a matching annotation. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Signed-off-by: Dave Jones <davej@redhat.com>	2010-10-22 11:44:47 -04:00
Julia Lawall	b2a33c1728	[CPUFREQ] arch/x86/kernel/cpu/cpufreq: Fix unsigned return type In each case, the function has an unsigned return type, but returns a negative constant to indicate an error condition. Each function is only called once. For nforce2_detect_chipset, the result is only compared to 0, and for longrun_determine_freqs, the result is stored in a variable of type (signed) int. Thus, for both functions, unsigned can be dropped from the return type. A sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @exists@ identifier f; constant C; @@ unsigned f(...) { <+... * return -C; ...+> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Dave Jones <davej@redhat.com>	2010-10-22 11:44:47 -04:00
Andi Kleen	46e387bbd8	Merge branch 'hwpoison-hugepages' into hwpoison Conflicts: mm/memory-failure.c	2010-10-22 17:40:48 +02:00
Peter Zijlstra	96681fc3c9	perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations For performance reasons its best to use memory node local memory for per-cpu buffers. This logic comes from a much larger patch proposed by Stephane. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.514465326@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:26 +02:00
Peter Zijlstra	f80c9e304b	perf, x86: Clean up reserve_ds_buffers() signature Now that reserve_ds_buffers() never fails, change it to return void and remove all code dealing with the error return. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.462621937@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:26 +02:00
Peter Zijlstra	6809b6ea73	perf, x86: Less disastrous PEBS/BTS buffer allocation failure Currently PEBS/BTS buffers are allocated when we instantiate the first event, when this fails everything fails. This is a problem because esp. BTS tries to allocate a rather large buffer (64K), which can easily fail. This patch changes the logic such that when either buffer allocation fails, we simply don't allow events that would use these facilities, but continue functioning for all other events. This logic comes from a much larger patch proposed by Stephane. Suggested-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.354429461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:26 +02:00
Peter Zijlstra	5553be2620	perf, x86: Fixup the precise_ip computation In case we don't have PEBS, the LBR fixup doesn't make sense. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.354429461@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:25 +02:00
Peter Zijlstra	65af94baca	perf, x86: Extract DS alloc/free functions Again, mostly a cleanup to unclutter the reserve_ds_buffer() code. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.304495776@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:25 +02:00
Peter Zijlstra	5ee25c8731	perf, x86: Extract PEBS/BTS allocation functions Mostly a cleanup.. it reduces code indentation and makes the code flow of reserve_ds_buffers() clearer. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.253453452@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:25 +02:00
Peter Zijlstra	b39f88acd7	perf, x86: Extract PEBS/BTS buffer free routines So that we may grow additional call-sites.. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Stephane Eranian <eranian@google.com> LKML-Reference: <20101019134808.196793164@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 14:18:24 +02:00
Jan Beulich	07bd8516a2	x86, asm: Restore parentheses around one pushl_cfi argument These were (intentionally) stripped by "fix CFI macro invocations to deal with shortcomings in gas" to expose problems with unexpected splitting of arguments by older gas also on newer versions, but as it turns out there is at least one distro (Ubuntu 6.06) where even not having any spaces in a macro argument doesn't reliably prevent splitting into multiple arguments. Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Alexander van Heukelum <heukelum@fastmail.fm> LKML-Reference: <4CC157DB020000780001E8A2@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-22 10:51:44 +02:00
Linus Torvalds	3044100e58	Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits) x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S xen: Cope with unmapped pages when initializing kernel pagetable memblock, bootmem: Round pfn properly for memory and reserved regions memblock: Annotate memblock functions with __init_memblock memblock: Allow memblock_init to be called early memblock/arm: Fix memblock_region_is_memory() typo x86, memblock: Remove __memblock_x86_find_in_range_size() memblock: Fix wraparound in find_region() x86-32, memblock: Make add_highpages honor early reserved ranges x86, memblock: Fix crashkernel allocation arm, memblock: Fix the sparsemem build memblock: Fix section mismatch warnings powerpc, memblock: Fix memblock API change fallout memblock, microblaze: Fix memblock API change fallout x86: Remove old bootmem code x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve x86: Remove not used early_res code x86, memblock: Replace e820_/_early string with memblock_ x86: Use memblock to replace early_res x86, memblock: Use memblock_debug to control debug message print out ... Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile	2010-10-21 18:52:11 -07:00
Linus Torvalds	e36f561a2c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags * git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-irqflags: Fix IRQ flag handling naming MIPS: Add missing #inclusions of <linux/irq.h> smc91x: Add missing #inclusion of <linux/irq.h> Drop a couple of unnecessary asm/system.h inclusions SH: Add missing consts to sys_execve() declaration Blackfin: Rename IRQ flags handling functions Blackfin: Add missing dep to asm/irqflags.h Blackfin: Rename DES PC2() symbol to avoid collision Blackfin: Split the BF532 BFIN_*_FIO_FLAG() functions to their own header Blackfin: Split PLL code from mach-specific cdef headers	2010-10-21 14:37:27 -07:00
Linus Torvalds	157b6ceb13	Merge branch 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, iommu: Update header comments with appropriate naming ia64, iommu: Add a dummy iommu_table.h file in IA64. x86, iommu: Fix IOMMU_INIT alignment rules x86, doc: Adding comments about .iommu_table and its neighbors. x86, iommu: Utilize the IOMMU_INIT macros functionality. x86, VT-d: Make Intel VT-d IOMMU use IOMMU_INIT_* macros. x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros. x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros. x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros. x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros. x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine. x86, iommu: Add proper dependency sort routine (and sanity check). x86, iommu: Make all IOMMU's detection routines return a value. x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure	2010-10-21 14:23:48 -07:00
Linus Torvalds	4a60cfa945	Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (96 commits) apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets apic, x86: Check if EILVT APIC registers are available (AMD only) x86: ioapic: Call free_irte only if interrupt remapping enabled arm: Use ARCH_IRQ_INIT_FLAGS genirq, ARM: Fix boot on ARM platforms genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build x86: Switch sparse_irq allocations to GFP_KERNEL genirq: Switch sparse_irq allocator to GFP_KERNEL genirq: Make sparse_lock a mutex x86: lguest: Use new irq allocator genirq: Remove the now unused sparse irq leftovers genirq: Sanitize dynamic irq handling genirq: Remove arch_init_chip_data() x86: xen: Sanitise sparse_irq handling x86: Use sane enumeration x86: uv: Clean up the direct access to irq_desc x86: Make io_apic.c local functions static genirq: Remove irq_2_iommu x86: Speed up the irq_remapped check in hot pathes intr_remap: Simplify the code further ... Fix up trivial conflicts in arch/x86/Kconfig	2010-10-21 14:11:46 -07:00
Linus Torvalds	5fe8321b88	Merge branch 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-x2apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, x2apic: Simplify apic init in SMP and UP builds x86, intr-remap: Remove IRTE setup duplicate code x86, intr-remap: Set redirection hint in the IRTE	2010-10-21 13:54:05 -07:00
Linus Torvalds	709d9f54cc	Merge branch 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI x86, vmware: Remove deprecated VMI kernel support Fix up trivial #include conflict in arch/x86/kernel/smpboot.c	2010-10-21 13:53:24 -07:00
Linus Torvalds	cca8209ed9	Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc: XO-1 uses/depends on PCI x86, olpc: Register XO-1 platform devices x86, olpc: Add XO-1 poweroff support x86, olpc: Don't retry EC commands forever x86, olpc: Rework BIOS signature check x86, olpc: Only enable PCI configuration type override on XO-1	2010-10-21 13:52:01 -07:00
Linus Torvalds	d77bdc423d	Merge branch 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mtrr: Support mtrr lookup for range spanning across MTRR range x86, mtrr: Refactor MTRR type overlap check code	2010-10-21 13:51:41 -07:00
Linus Torvalds	87affd0b94	Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: sfi: Make local functions static x86, earlyprintk: Add hsu early console for Intel Medfield platform x86, earlyprintk: Add earlyprintk for Intel Moorestown platform x86: Add two helper macros for fixed address mapping x86, mrst: A function in a header file needs to be marked "inline"	2010-10-21 13:47:54 -07:00
Linus Torvalds	c3b86a2942	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, percpu: Correct the ordering of the percpu readmostly section x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 \|\| HIGHMEM64G x86: Spread tlb flush vector between nodes percpu: Introduce a read-mostly percpu API x86, mm: Fix incorrect data type in vmalloc_sync_all() x86, mm: Hold mm->page_table_lock while doing vmalloc_sync x86, mm: Fix bogus whitespace in sync_global_pgds() x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation x86, mm: Add RESERVE_BRK_ARRAY() helper mm, x86: Saving vmcore with non-lazy freeing of vmas x86, kdump: Change copy_oldmem_page() to use cached addressing x86, mm: fix uninitialized addr in kernel_physical_mapping_init() x86, kmemcheck: Remove double test x86, mm: Make spurious_fault check explicitly check the PRESENT bit x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions x86, mm: Avoid unnecessary TLB flush	2010-10-21 13:47:29 -07:00
Linus Torvalds	8d8d2e9ccd	Merge branch 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mem: Optimize memmove for small size and unaligned cases x86, mem: Optimize memcpy by avoiding memory false dependece x86, mem: Don't implement forward memmove() as memcpy()	2010-10-21 13:46:28 -07:00
Linus Torvalds	2a8b67fb72	Merge branch 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hotplug: In the MWAIT case of play_dead, CLFLUSH the cache line x86, hotplug: Move WBINVD back outside the play_dead loop x86, hotplug: Use mwait to offline a processor, fix the legacy case x86, mwait: Move mwait constants to a common header file	2010-10-21 13:45:38 -07:00
Linus Torvalds	b6f7e38dbb	Merge branch 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, fpu: Merge fpu_save_init() x86-32, fpu: Rewrite fpu_save_init() x86, fpu: Remove PSHUFB_XMM5_* macros x86, fpu: Remove unnecessary ifdefs from i387 code. x86-32, fpu: Remove math_emulate stub x86-64, fpu: Simplify constraints for fxsave/fxtstor x86-64, fpu: Fix %cs value in convert_from_fxsr() x86-64, fpu: Disable preemption when using TS_USEDFPU x86, fpu: Merge __save_init_fpu() x86, fpu: Merge tolerant_fwait() x86, fpu: Merge fpu_init() x86: Use correct type for %cr4 x86, xsave: Disable xsave in i387 emulation mode Fixed up fxsaveq-induced conflict in arch/x86/include/asm/i387.h	2010-10-21 13:34:32 -07:00
Alok Kataria	76fac077db	x86, kexec: Make sure to stop all CPUs before exiting the kernel x86 smp_ops now has a new op, stop_other_cpus which takes a parameter "wait" this allows the caller to specify if it wants to stop until all the cpus have processed the stop IPI. This is required specifically for the kexec case where we should wait for all the cpus to be stopped before starting the new kernel. We now wait for the cpus to stop in all cases except for panic/kdump where we expect things to be broken and we are doing our best to make things work anyway. This patch fixes a legitimate regression, which was introduced during 2.6.30, by commit id `4ef702c10b`. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1286833028.1372.20.camel@ank32.eng.vmware.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: <stable@kernel.org> v2.6.30-36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-21 13:30:44 -07:00
Linus Torvalds	214515b578	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove pr_<level> uses of KERN_<level> therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal' x86: Use {push,pop}{l,q}_cfi in more places i386: Add unwind directives to syscall ptregs stubs x86-64: Use symbolics instead of raw numbers in entry_64.S x86-64: Adjust frame type at paranoid_exit: x86-64: Fix unwind annotations in syscall stubs	2010-10-21 13:20:32 -07:00
Linus Torvalds	bf70030dc0	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cpu: Fix X86_FEATURE_NOPL x86, cpu: Re-run get_cpu_cap() after adjusting the CPUID level	2010-10-21 13:18:36 -07:00
Linus Torvalds	d60a2793ba	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove stale pmtimer_64.c x86, cleanups: Use clear_page/copy_page rather than memset/memcpy x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI x86, cleanup: Remove obsolete boot_cpu_id variable	2010-10-21 13:18:06 -07:00
Linus Torvalds	781c5a67f1	Merge branch 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-bios-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, bios: Make the x86 early memory reservation a kernel option x86, bios: By default, reserve the low 64K for all BIOSes	2010-10-21 13:06:49 -07:00
Linus Torvalds	e990c77d06	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64, asm: If the assembler supports fxsave64, use it i386: Make kernel_execve() suitable for stack unwinding	2010-10-21 13:06:00 -07:00
Linus Torvalds	2f0384e5fc	Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd_nb: Enable GART support for AMD family 0x15 CPUs x86, amd: Use compute unit information to determine thread siblings x86, amd: Extract compute unit information for AMD CPUs x86, amd: Add support for CPUID topology extension of AMD CPUs x86, nmi: Support NMI watchdog on newer AMD CPU families x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB x86, k8-gart: Decouple handling of garts and northbridges x86, cacheinfo: Fix dependency of AMD L3 CID x86, kvm: add new AMD SVM feature bits x86, cpu: Fix allowed CPUID bits for KVM guests x86, cpu: Update AMD CPUID feature bits x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit x86, AMD: Remove needless CPU family check (for L3 cache info) x86, tsc: Remove CPU frequency calibration on AMD	2010-10-21 13:01:08 -07:00
Linus Torvalds	bc4016f481	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (29 commits) sched: Export account_system_vtime() sched: Call tick_check_idle before __irq_enter sched: Remove irq time from available CPU power sched: Do not account irq time to current task x86: Add IRQ_TIME_ACCOUNTING sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time sched: Add a PF flag for ksoftirqd identification sched: Consolidate account_system_vtime extern declaration sched: Fix softirq time accounting sched: Drop group_capacity to 1 only if local group has extra capacity sched: Force balancing on newidle balance if local group has capacity sched: Set group_imb only a task can be pulled from the busiest cpu sched: Do not consider SCHED_IDLE tasks to be cache hot sched: Drop all load weight manipulation for RT tasks sched: Create special class for stop/migrate work sched: Unindent labels sched: Comment updates: fix default latency and granularity numbers tracing/sched: Add sched_pi_setprio tracepoint sched: Give CPU bound RT tasks preference sched: Try not to migrate higher priority RT tasks ...	2010-10-21 12:55:43 -07:00
Linus Torvalds	5d70f79b5e	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits) tracing: Fix compile issue for trace_sched_wakeup.c [S390] hardirq: remove pointless header file includes [IA64] Move local_softirq_pending() definition perf, powerpc: Fix power_pmu_event_init to not use event->ctx ftrace: Remove recursion between recordmcount and scripts/mod/empty jump_label: Add COND_STMT(), reducer wrappery perf: Optimize sw events perf: Use jump_labels to optimize the scheduler hooks jump_label: Add atomic_t interface jump_label: Use more consistent naming perf, hw_breakpoint: Fix crash in hw_breakpoint creation perf: Find task before event alloc perf: Fix task refcount bugs perf: Fix group moving irq_work: Add generic hardirq context callbacks perf_events: Fix transaction recovery in group_sched_in() perf_events: Fix bogus AMD64 generic TLB events perf_events: Fix bogus context time tracking tracing: Remove parent recording in latency tracer graph options tracing: Use one prologue for the preempt irqs off tracer function tracers ...	2010-10-21 12:54:49 -07:00
Linus Torvalds	1053e6bba0	Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Update copyright headers x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend AGP: Warn when GATT memory cannot be set to UC x86, GART: Disable GART table walk probes x86, GART: Remove superfluous AMD64_GARTEN	2010-10-21 12:49:15 -07:00
Daniel Drake	260586d2b4	Add OLPC XO-1 rfkill driver Add a software rfkill switch for the WLAN interface in the OLPC XO-1 laptop. It uses the OLPC embedded controller to cut/restore power to the Marvell WLAN chip on the motherboard. Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Matthew Garrett <mjg@redhat.com>	2010-10-21 10:10:44 -04:00
Konrad Rzeszutek Wilk	5bba6c56dc	X86/PCI: Remove the dependency on isapnp_disable. This looks to be vestigial dependency that had never been used even in the original code base (2.6.18) from which this driver was up-ported. Without this fix, with the CONFIG_ISAPNP, we get this compile failure: arch/x86/pci/xen.c: In function 'pci_xen_init': arch/x86/pci/xen.c:138: error: 'isapnp_disable' undeclared (first use in this function) arch/x86/pci/xen.c:138: error: (Each undeclared identifier is reported only once arch/x86/pci/xen.c:138: error: for each function it appears in.) Reported-by: Li Zefan <lizf@cn.fujitsu.com> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-21 09:36:07 -04:00
Ian Campbell	de1ef2065c	xen/privcmd: move remap_domain_mfn_range() to core xen code and export. This allows xenfs to be built as a module, previously it required flush_tlb_all and arbitrary_virt_to_machine to be exported. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-20 16:22:34 -07:00
Jeremy Fitzhardinge	1246ae0bb9	xen: add variable hypercall caller Allow non-constant hypercall to be called, for privcmd. [ Impact: make arbitrary hypercalls; needed for privcmd ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-20 16:22:27 -07:00
Jeremy Fitzhardinge	eba3ff8b99	xen: add xen_set_domain_pte() Add xen_set_domain_pte() to allow setting a pte mapping a page from another domain. The common case is to map from DOMID_IO, the pseudo domain which owns all IO pages, but will also be used in the privcmd interface to map other domain pages. [ Impact: new Xen-internal API for cross-domain mappings ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2010-10-20 16:22:27 -07:00
FUJITA Tomonori	66f2b06154	x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 \|\| HIGHMEM64G Set CONFIG_ARCH_DMA_ADDR_T_64BIT when we set dma_addr_t to 64 bits in <asm/types.h>; this allows Kconfig decisions based on this property. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> LKML-Reference: <201010202255.o9KMtZXu009370@imap1.linux-foundation.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 16:02:42 -07:00
Shaohua Li	9329672021	x86: Spread tlb flush vector between nodes Currently flush tlb vector allocation is based on below equation: sender = smp_processor_id() % 8 This isn't optimal, CPUs from different node can have the same vector, this causes a lot of lock contention. Instead, we can assign the same vectors to CPUs from the same node, while different node has different vectors. This has below advantages: a. if there is lock contention, the lock contention is between CPUs from one node. This should be much cheaper than the contention between nodes. b. completely avoid lock contention between nodes. This especially benefits kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity to specific node. In my test, this could reduce > 20% CPU overhead in extreme case.The test machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd to the first CPU of the node. I run a workload with 4 sequential mmap file read thread. The files are empty sparse file. This workload will trigger a lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the extreme tlb flush lock contention because otherwise kswapd keeps migrating between CPUs of a node and I can't get stable result. Sure in real workload, we can't always see so big tlb flush lock contention, but it's possible. [ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 14:44:42 -07:00
Borislav Petkov	b40827fa72	x86-32, mm: Add an initial page table for core bootstrapping This patch adds an initial page table with low mappings used exclusively for booting APs/resuming after ACPI suspend/machine restart. After this, there's no need to add low mappings to swapper_pg_dir and zap them later or create own swsusp PGD page solely for ACPI sleep needs - we have initial_page_table for that. Signed-off-by: Borislav Petkov <bp@alien8.de> LKML-Reference: <20101020070526.GA9588@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 14:23:55 -07:00
H. Peter Anvin	d25e6b0b32	Merge branch 'x86/cleanups' into x86/trampoline	2010-10-20 14:22:45 -07:00
H. Peter Anvin	e44dea35cc	Merge branch 'x86/vmware' into x86/trampoline	2010-10-20 13:18:17 -07:00
Borislav Petkov	f01f7c56a1	x86, mm: Fix incorrect data type in vmalloc_sync_all() arch/x86/mm/fault.c: In function 'vmalloc_sync_all': arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast introduced by `617d34d9e5`. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 12:54:04 -07:00
Robert Richter	27afdf2008	apic, x86: Use BIOS settings for IBS and MCE threshold interrupt LVT offsets We want the BIOS to setup the EILVT APIC registers. The offsets were hardcoded and BIOS settings were overwritten by the OS. Now, the subsystems for MCE threshold and IBS determine the LVT offset from the registers the BIOS has setup. If the BIOS setup is buggy on a family 10h system, a workaround enables IBS. If the OS determines an invalid register setup, a "[Firmware Bug]: " error message is reported. We need this change also for upcomming cpu families. Signed-off-by: Robert Richter <robert.richter@amd.com> LKML-Reference: <1286360874-1471-3-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-20 04:42:13 +02:00
Robert Richter	a68c439b19	apic, x86: Check if EILVT APIC registers are available (AMD only) This patch implements checks for the availability of LVT entries (APIC500-530) and reserves it if used. The check becomes necessary since we want to let the BIOS provide the LVT offsets. The offsets should be determined by the subsystems using it like those for MCE threshold or IBS. On K8 only offset 0 (APIC500) and MCE interrupts are supported. Beginning with family 10h at least 4 offsets are available. Since offsets must be consistent for all cores, we keep track of the LVT offsets in software and reserve the offset for the same vector also to be used on other cores. An offset is freed by setting the entry to APIC_EILVT_MASKED. If the BIOS is right, there should be no conflicts. Otherwise a "[Firmware Bug]: ..." error message is generated. However, if software does not properly determines the offsets, it is not necessarily a BIOS bug. Signed-off-by: Robert Richter <robert.richter@amd.com> LKML-Reference: <1286360874-1471-2-git-send-email-robert.richter@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-20 04:42:13 +02:00
Ingo Molnar	14d4962dc8	Merge branch 'linus' into irq/core Merge reason: update to almost-final-.36 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-20 04:38:59 +02:00
Jan Beulich	3234282f33	x86, asm: Fix CFI macro invocations to deal with shortcomings in gas gas prior to (perhaps) 2.16.90 has problems with passing non- parenthesized expressions containing spaces to macros. Spaces, however, get inserted by cpp between any macro expanding to a number and a subsequent + or -. For the +, current x86 gas then removes the space again (future gas may not do so), but for the - the space gets retained and is then considered a separator between macro arguments. Fix the respective definitions for both the - and + cases, so that they neither contain spaces nor make cpp insert any (the latter by adding seemingly redundant parentheses). Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CBDBEBA020000780001E05A@vpn.id2.novell.com> Cc: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 14:28:02 -07:00
Jeremy Fitzhardinge	617d34d9e5	x86, mm: Hold mm->page_table_lock while doing vmalloc_sync Take mm->page_table_lock while syncing the vmalloc region. This prevents a race with the Xen pagetable pin/unpin code, which expects that the page_table_lock is already held. If this race occurs, then Xen can see an inconsistent page type (a page can either be read/write or a pagetable page, and pin/unpin converts it between them), which will cause either the pin or the set_p[gm]d to fail; either will crash the kernel. vmalloc_sync_all() should be called rarely, so this extra use of page_table_lock should not interfere with its normal users. The mm pointer is stashed in the pgd page's index field, as that won't be otherwise used for pgds. Reported-by: Ian Campbell <ian.cambell@eu.citrix.com> Originally-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CB88A4C.1080305@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:57:08 -07:00
Jeremy Fitzhardinge	44235dcde4	x86, mm: Fix bogus whitespace in sync_global_pgds() Whitespace cleanup only. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:56:03 -07:00
Avi Kivity	9581d442b9	KVM: Fix fs/gs reload oops with invalid ldt kvm reloads the host's fs and gs blindly, however the underlying segment descriptors may be invalid due to the user modifying the ldt after loading them. Fix by using the safe accessors (loadsegment() and load_gs_index()) instead of home grown unsafe versions. This is CVE-2010-3698. KVM-Stable-Tag. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-19 14:21:45 -02:00
Yinghai Lu	9717967c4b	x86: ioapic: Call free_irte only if interrupt remapping enabled On a system that support intr-rempping when booting with "intremap=off" [ 177.895501] BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8 [ 177.913316] IP: [<ffffffff8145fc18>] free_irte+0x47/0xc0 ... [ 178.173326] Call Trace: [ 178.173574] [<ffffffff810515b4>] destroy_irq+0x3a/0x75 [ 178.192934] [<ffffffff81051834>] arch_teardown_msi_irq+0xe/0x10 [ 178.193418] [<ffffffff81458dc3>] arch_teardown_msi_irqs+0x56/0x7f [ 178.213021] [<ffffffff81458e79>] free_msi_irqs+0x8d/0xeb Call free_irte only when interrupt remapping is enabled. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CBCB274.7010108@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-19 09:25:33 +02:00
Venkatesh Pallipadi	e82b8e4ea4	x86: Add IRQ_TIME_ACCOUNTING This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it when TSC is enabled. This change just enables fine grained irq time accounting, isn't used yet. Following patches use it for different purposes. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-6-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 20:52:25 +02:00
Peter Zijlstra	e360adbe29	irq_work: Add generic hardirq context callbacks Provide a mechanism that allows running code in IRQ context. It is most useful for NMI code that needs to interact with the rest of the system -- like wakeup a task to drain buffers. Perf currently has such a mechanism, so extract that and provide it as a generic feature, independent of perf so that others may also benefit. The IRQ context callback is generated through self-IPIs where possible, or on architectures like powerpc the decrementer (the built-in timer facility) is set to generate an interrupt immediately. Architectures that don't have anything like this get to do with a callback from the timer tick. These architectures can call irq_work_run() at the tail of any IRQ handlers that might enqueue such work (like the perf IRQ handler) to avoid undue latencies in processing the work. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [ various fixes ] Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1287036094.7768.291.camel@yhuang-dev> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 19:58:50 +02:00
Stephane Eranian	ba0cef3d14	perf_events: Fix bogus AMD64 generic TLB events PERF_COUNT_HW_CACHE_DTLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x7 (to count all possibilities). PERF_COUNT_HW_CACHE_ITLB:READ:MISS had a bogus umask value of 0 which counts nothing. Needed to be 0x3 (to count all possibilities). Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: <stable@kernel.org> # as far back as it applies LKML-Reference: <4cb85478.41e9d80a.44e2.3f00@mx.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 19:58:48 +02:00
Konrad Rzeszutek Wilk	74226b8c8a	xen/pci: Request ACS when Xen-SWIOTLB is activated. It used to done in the Xen startup code but that is not really appropiate. [v2: Update Kconfig with PCI requirement] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-18 10:49:38 -04:00
Alex Nixon	b5401a96b5	xen/x86/PCI: Add support for the Xen PCI subsystem The frontend stub lives in arch/x86/pci/xen.c, alongside other sub-arch PCI init code (e.g. olpc.c). It provides a mechanism for Xen PCI frontend to setup/destroy legacy interrupts, MSI/MSI-X, and PCI configuration operations. [ Impact: add core of Xen PCI support ] [ v2: Removed the IOMMU code and only focusing on PCI.] [ v3: removed usage of pci_scan_all_fns as that does not exist] [ v4: introduced pci_xen value to fix compile warnings] [ v5: squished fixes+features in one patch, changed Reviewed-by to Ccs] [ v7: added Acked-by] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Qing He <qing.he@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org	2010-10-18 10:49:35 -04:00
Stefano Stabellini	294ee6f89c	x86: Introduce x86_msi_ops Introduce an x86 specific indirect mechanism to setup MSIs. The MSI setup functions become function pointers in an x86_msi_ops struct, that defaults to the implementation in io_apic.c and msi.c. [v2: Use HAVE_DEFAULT_* knobs] Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-18 10:49:34 -04:00
Jeremy Fitzhardinge	5ee01f49c9	x86/PCI: make sure _PAGE_IOMAP it set on pci mappings When mapping pci space via /sys or /proc, make sure we're really doing a hardware mapping by setting _PAGE_IOMAP. [ Impact: bugfix; make PCI mappings map the right pages ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: x86@kernel.org	2010-10-18 10:49:31 -04:00
Alex Nixon	44de3395a4	x86/PCI: Clean up pci_cache_line_size Separate out x86 cache_line_size initialisation code into its own function (so it can be shared by Xen later in this patch series) [ Impact: cleanup ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: x86@kernel.org	2010-10-18 10:49:30 -04:00
Jeremy Fitzhardinge	7b586d7185	x86/io_apic: add get_nr_irqs_gsi() Impact: new interface to get max GSI Add get_nr_irqs_gsi() to return nr_irqs_gsi. Xen will use this to determine how many irqs it needs to reserve for hardware irqs. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: "H. Peter Anvin" <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-18 10:40:30 -04:00
Jeremy Fitzhardinge	d8e0420603	xen: define BIOVEC_PHYS_MERGEABLE() Impact: allow Xen control of bio merging When running in Xen domain with device access, we need to make sure the block subsystem doesn't merge requests across pages which aren't machine physically contiguous. To do this, we define our own BIOVEC_PHYS_MERGEABLE. When CONFIG_XEN isn't enabled, or we're not running in a Xen domain, this has identical behaviour to the normal implementation. When running under Xen, we also make sure the underlying machine pages are the same or adjacent. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-18 10:40:28 -04:00
Alex Nixon	23ace955c2	xen: Don't disable the I/O space If a guest domain wants to access PCI devices through the frontend driver (coming later in the patch series), it will need access to the I/O space. [ Impact: Allow for domU IO access, preparing for pci passthrough ] Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2010-10-18 10:40:27 -04:00
Justin P. Mattock	50a23e6eec	Update broken web addresses in arch directory. The patch below updates broken web addresses in the arch directory. Signed-off-by: Justin P. Mattock <justinmattock@gmail.com> Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Cc: Finn Thain <fthain@telegraphics.com.au> Cc: Randy Dunlap <rdunlap@xenotime.net> Reviewed-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-10-18 11:03:21 +02:00
Bjorn Helgaas	1ca98fa652	x86/PCI: MMCONFIG: fix region end calculation The end of an MMCONFIG region depends on the ending bus number, not on the number of buses the region covers. We previously computed the wrong ending address whenever the starting bus number was non-zero, e.g.,: MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000) MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe1ffffff] (base 0xe0000000) The correct regions are: MMCONFIG for [bus 00-1f] at [mem 0xe0000000-0xe1ffffff] (base 0xe0000000) MMCONFIG for [bus 20-3f] at [mem 0xe2000000-0xe3ffffff] (base 0xe0000000) Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-17 20:03:07 -07:00
Seth Heasley	cb04e95bdd	PCI: update Intel chipset names and defines This patch updates the defines for Intel devices in include/linux/pci_ids.h, referenced in arch/x86/pci/irq.c and drivers/i2c/busses/i2c-i801.c, reflecting approved legal branding, and using fuller code-names for products under development. Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-17 20:03:04 -07:00
Seth Heasley	25143fd127	x86/PCI: irq and pci_ids patch for Intel Patsburg DeviceIDs This patch adds the LPC Controller DeviceIDs for the Intel Patsburg PCH. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-15 13:09:52 -07:00
Daniel Drake	80e7b19ae1	PCI: OLPC: Only enable PCI configuration type override on XO-1 This configuration type override is for XO-1 only and must not happen on XO-1.5. Acked-by: Andres Salomon <dilinger@queued.net> Signed-off-by: Daniel Drake <dsd@laptop.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-15 13:09:51 -07:00
Thomas Gleixner	40ffa93791	x86: Remove stale pmtimer_64.c This file is unused since the apic unification in 2.6.29, but nobody noticed. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-15 21:18:59 +02:00
Thomas Gleixner	940b3c7b19	x86: sfi: Make local functions static Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org>	2010-10-15 19:37:39 +02:00
Arnd Bergmann	6038f373a3	llseek: automatically add .llseek fop All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode i, struct file f) { <+... ( nonseekable_open(...) \| nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file f, char p, size_t s, loff_t off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { <+... ( off = E \| off += E \| func(..., off, ...) \| E = off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file f, const char p, size_t s, loff_t off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable / }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, / open uses nonseekable / }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, / we have seq_read / }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, / readdir is present / }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, / read accesses f_pos / }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, / write accesses f_pos / }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, / read and write both use no f_pos / }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, / write uses no f_pos / }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, / read uses no f_pos / }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, / no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>	2010-10-15 15:53:27 +02:00
Robert Richter	b47fad3bfb	oprofile, x86: Add support for IBS periodic op counter extension The count value for IBS op sampling has been extended by 7 bits. The feature is reflected in bit 6 (OpCntExt) of the IBS capability register (CPUID Fn8000_001B_EAX). Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:43 +02:00
Robert Richter	25da695047	oprofile, x86: Add support for IBS branch target address reporting This patch adds support for IBS branch target address reporting. A new MSR (MSRC001_103B IBS Branch Target Address) has been added that provides the logical address in canonical form for the branch target. The size of the IBS sample that is transferred to the userland has been increased. For backward compatibility, the userland daemon must explicit enable the feature by writing to the oprofilefs file ibs_op/branch_target After enabling branch target address reporting, the userland daemon must handle the extended size of the IBS sample. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:42 +02:00
Robert Richter	53b39e9480	oprofile, x86: Introduce struct ibs_state This patch introduces struct ibs_state that will extended by additinal members in follow-on patches. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:41 +02:00
Robert Richter	fc889aa23f	oprofile, x86: Remove duplicate check for IBS_CAPS_OPCNT Since oprofile is setting up ibs_op/dispatched_ops in the fs only if the feature is available, its corresponding variable ibs_config.dispatched_ops is only set, if the feature is available. Thus the check is duplicate and can be removed. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:41 +02:00
Robert Richter	4ac945f002	oprofile, x86: Check IBS capability bits 1 and 2 There are IBS CPUID feature flags in CPUID Fn8000_001B to detect if the cpu supports IBS fetch sampling (FetchSam) and/or IBS execution sampling (OpSam). This patch adds checks if the both features are available. Spec: http://support.amd.com/us/Processor_TechDocs/31116.pdf Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:40 +02:00
Robert Richter	e63414740e	oprofile, x86: Add support for AMD family 14h This patch adds support for AMD family 14h (Ontario/Zacate) cpus. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:39 +02:00
Robert Richter	3acbf0849b	oprofile, x86: Add support for AMD family 12h This patch adds support for AMD family 12h (Llano) cpus. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-15 12:50:39 +02:00
Robert Richter	6268464b37	Merge remote branch 'tip/perf/core' into oprofile/core Conflicts: arch/arm/oprofile/common.c kernel/perf_event.c	2010-10-15 12:45:00 +02:00
Randy Dunlap	9e9006e909	x86, olpc: XO-1 uses/depends on PCI olpc-xo1 uses pci_*() interfaces so it should depend on PCI. Otherwise we get build failure like: arch/x86/kernel/olpc-xo1.c:65: error: implicit declaration of function 'pci_enable_device_io' arch/x86/kernel/olpc-xo1.c:71: error: implicit declaration of function 'pci_request_region' arch/x86/kernel/olpc-xo1.c:80: error: implicit declaration of function 'pci_release_region' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Daniel Drake <dsd@laptop.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20101014101313.adf7eb2a.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-15 09:41:38 +02:00
Ingo Molnar	0fdf13606b	Merge branch 'tip/perf/recordmcount-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2010-10-15 06:12:28 +02:00
Steven Rostedt	cf4db2597a	ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT The config option used by archs to let the build system know that the C version of the recordmcount works for said arch is currently called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To be more consistent with the name that all archs may use, it has been renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since we are building a C recordmcount and not a mcount_record. Suggested-by: Ingo Molnar <mingo@elte.hu> Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-14 23:32:44 -04:00
Ingo Molnar	d9d572a9c0	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-10-15 05:12:45 +02:00
Steven Rostedt	72441cb1fd	ftrace/x86: Add support for C version of recordmcount This patch adds the support for the C version of recordmcount and compile times show ~ 12% improvement. After verifying this works, other archs can add: HAVE_C_MCOUNT_RECORD in its Kconfig and it will use the C version of recordmcount instead of the perl version. Cc: <linux-arch@vger.kernel.org> Cc: Michal Marek <mmarek@suse.cz> Cc: linux-kbuild@vger.kernel.org Cc: John Reiser <jreiser@bitwagon.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2010-10-14 16:52:41 -04:00
Frederic Weisbecker	ebc8827f75	x86: Barf when vmalloc and kmemcheck faults happen in NMI In x86, faults exit by executing the iret instruction, which then reenables NMIs if we faulted in NMI context. Then if a fault happens in NMI, another NMI can nest after the fault exits. But we don't yet support nested NMIs because we have only one NMI stack. To prevent from that, check that vmalloc and kmemcheck faults don't happen in this context. Most of the other kernel faults in NMIs can be more easily spotted by finding explicit copy_from,to_user() calls on review. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>	2010-10-14 20:43:36 +02:00
Linus Torvalds	0eead9ab41	Don't dump task struct in a.out core-dumps akiphie points out that a.out core-dumps have that odd task struct dumping that was never used and was never really a good idea (it goes back into the mists of history, probably the original core-dumping code). Just remove it. Also do the access_ok() check on dump_write(). It probably doesn't matter (since normal filesystems all seem to do it anyway), but he points out that it's normally done by the VFS layer, so ... [ I suspect that we should possibly do "vfs_write()" instead of calling ->write directly. That also does the whole fsnotify and write statistics thing, which may or may not be a good idea. ] And just to be anal, do this all for the x86-64 32-bit a.out emulation code too, even though it's not enabled (and won't currently even compile) Reported-by: akiphie <akiphie@lavabit.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-14 10:57:40 -07:00
Jeremy Fitzhardinge	67e87f0a1c	x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S head_64.S maps up to 512 MiB, but that is not necessarity true for other entry paths, such as Xen. Thus, co-locate the setting of max_pfn_mapped with the code to actually set up the page tables in head_64.S. The 32-bit code is already so co-located. (The Xen code already sets max_pfn_mapped correctly for its own use case.) -v2: Yinghai fixed the following bug in this patch: \| \| max_pfn_mapped is in .bss section, so we need to set that \| after bss get cleared. Without that we crash on bootup. \| \| That is safe because Xen does not call x86_64_start_kernel(). \| Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Fixed-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <4CB6AB24.9020504@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 09:06:49 +02:00
Masami Hiramatsu	3cba11d32b	kconfig/x86: Add HAVE_TEXT_POKE_SMP config for stop_machine dependency Since the text_poke_smp() definately depends on actual stop_machine() on smp, add that dependency to Kconfig. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20101014031042.4100.90877.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 08:55:29 +02:00
Masami Hiramatsu	3caa37519c	x86: Use __stop_machine() in text_poke_smp() Use __stop_machine() in text_poke_smp() because the caller must get online_cpus before calling text_poke_smp(), but stop_machine() do it again. We don't need it. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: 2nddept-manager@sdl.hitachi.co.jp Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20101014031036.4100.83989.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 08:55:28 +02:00
Randy Dunlap	03f1a17cd5	x86/vsmp: Eliminate kconfig dependency warning Fix kconfig dependency warning to satisfy dependencies: warning: (X86_VSMP && X86_64 && PCI && X86_EXTENDED_PLATFORM \|\| XEN && PARAVIRT_GUEST && (X86_64 \|\| X86_32 && X86_PAE && !X86_VISWS) && X86_CMPXCHG && X86_TSC \|\| KVM_CLOCK && PARAVIRT_GUEST \|\| KVM_GUEST && PARAVIRT_GUEST \|\| LGUEST_GUEST && PARAVIRT_GUEST && X86_32) selects PARAVIRT which has unmet direct dependencies (PARAVIRT_GUEST) Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Ravikiran Thirumalai <kiran@scalex86.org> LKML-Reference: <20101013210023.9a033222.randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-14 07:04:30 +02:00
Linus Torvalds	509d4486bd	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: For each node, register the memory blocks actually used x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order x86, mce, therm_throt.c: Fix missing curly braces in error handling logic	2010-10-13 16:34:23 -07:00
Jeremy Fitzhardinge	fef5ba7979	xen: Cope with unmapped pages when initializing kernel pagetable Xen requires that all pages containing pagetable entries to be mapped read-only. If pages used for the initial pagetable are already mapped then we can change the mapping to RO. However, if they are initially unmapped, we need to make sure that when they are later mapped, they are also mapped RO. We do this by knowing that the kernel pagetable memory is pre-allocated in the range e820_table_start - e820_table_end, so any pfn within this range should be mapped read-only. However, the pagetable setup code early_ioremaps the pages to write their entries, so we must make sure that mappings created in the early_ioremap fixmap area are mapped RW. (Those mappings are removed before the pages are presented to Xen as pagetable pages.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4CB63A80.8060702@goop.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-13 16:07:13 -07:00
H. Peter Anvin	d7acb92fea	x86-64, asm: If the assembler supports fxsave64, use it Kbuild allows for us to probe for the existence of specific constructs in the assembler, use them to find out if we can use fxsave64 and permit the compiler to generate better code. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-13 16:00:29 -07:00
Daniel Drake	447b1d43de	x86, olpc: Register XO-1 platform devices The upcoming XO-1 rfkill driver (for drivers/platform/x86) will register itself with the name "xo1-rfkill", and the already-merged XO-1 poweroff code uses name "olpc-xo1" Add the necessary mechanics so that these devices are properly initialized on XO-1 laptops. Signed-off-by: Daniel Drake <dsd@laptop.org> LKML-Reference: <20101013181042.90C8F9D401B@zog.reactivated.net> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-13 11:51:25 -07:00
Ingo Molnar	3d8a1a6a8a	Merge branch 'amd-iommu/2.6.37' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into core/iommu	2010-10-13 15:44:24 +02:00
Joerg Roedel	5d0d71569e	x86/amd-iommu: Update copyright headers This patch updates the copyright headers in all source files of the AMD IOMMU driver. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-10-13 11:13:21 +02:00
Matthew Garrett	5bcd757f93	x86/amd-iommu: Reenable AMD IOMMU if it's mysteriously vanished over suspend AMD's reference BIOS code had a bug that could result in the firmware failing to reenable the iommu on resume. It transpires that this causes certain less than desirable behaviour when it comes to PCI accesses, to whit them ending up somewhere near Bristol when the more desirable outcome was Edinburgh. Sadness ensues, perhaps along with filesystem corruption. Let's make sure that it gets turned back on, and that we restore its configuration so decisions it makes bear some resemblance to those made by reasonable people rather than crack-addled lemurs who spent all your DMA on Thunderbird. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>	2010-10-13 11:11:46 +02:00
Daniel Drake	bf1ebf0079	x86, olpc: Add XO-1 poweroff support Add a pm_power_off handler for the OLPC XO-1 laptop. The driver can be built modular and follows the behaviour of the APM driver, setting pm_power_off to NULL on unload. However, the ability to unload the module will probably be removed (with a simple __module_get(THIS_MODULE)) if/when XO-1 suspend/resume support is added to this file at a later date. Signed-off-by: Daniel Drake <dsd@laptop.org> LKML-Reference: <20101010094032.9AE669D401B@zog.reactivated.net> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-12 17:31:15 -07:00
Thomas Gleixner	2ee3906598	x86: Switch sparse_irq allocations to GFP_KERNEL No callers from atomic context (except boot) anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:46 +02:00
Thomas Gleixner	c2f31c37b7	x86: lguest: Use new irq allocator Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au>	2010-10-12 16:53:45 +02:00
Thomas Gleixner	ad9f43340f	x86: Use sane enumeration Instead of looping through all interrupts, use the bitmap lookup to find the next. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:44 +02:00
Thomas Gleixner	48b2650196	x86: uv: Clean up the direct access to irq_desc Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:43 +02:00
Thomas Gleixner	1a8ce7ff68	x86: Make io_apic.c local functions static No users outside of io_apic.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:43 +02:00
Thomas Gleixner	1a0730d664	x86: Speed up the irq_remapped check in hot pathes irq_2_iommu is in struct irq_cfg, so we can do the irq_remapped check based on irq_cfg instead of going through a lookup function. That's especially interesting in the eoi_ioapic_irq() hotpath. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:42 +02:00
Thomas Gleixner	423f085952	x86: Embedd irq_2_iommu into irq_cfg That interrupt remapping code is x86 specific and tied to the io_apic code. No need for separate allocator functions in the interrupt remapping code. This allows to simplify the code and irq_2_iommu is small (13 bytes on 64bit) so it's not a real problem even if interrupt remapping is runtime disabled. If it's compile time disabled the impact is zero. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:41 +02:00
Thomas Gleixner	bc5fdf9f3a	x86: io_apic: Remove the now unused sparse_irq arch_* functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	fbc6bff04a	x86: ioapic: Cleanup sparse irq code Switch over to the new allocator and remove all the magic which was caused by the unability to destroy irq descriptors. Get rid of the create_irq_nr() loop for sparse and non sparse irq. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Yinghai Lu	fe6dab4e79	x86: Don't setup ioapic irq for sci twice The sparseirq rework triggered a warning in the iommu code, which was caused by setting up ioapic for ACPI irq 9 twice. This function is solely to handle interrupts which are on a secondary ioapic and outside the legacy irq range. Replace the sparse irq_to_desc check with a non ifdeffed version. [ tglx: Moved it before the ioapic sparse conversion and simplified the inverse logic ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB00122.3030301@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	f981a3dc19	x86: io_apic: Prepare alloc/free_irq_cfg() Rename the grossly misnamed get_one_free_irq_cfg() to alloc_irq_cfg(). Add a (not yet used) irq number argument to free_irq_cfg() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	08c33db6d0	x86: Implement new allocator functions Implement new allocator functions which make use of the core changes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:40 +02:00
Thomas Gleixner	6e2fff50a5	x86: ioapic: Cleanup get_one_free_irq_cfg() Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	7e495529b6	x86: ioapic: Cleanup some more Cleanup after the irq_chip conversion a bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	be5b7bf738	x86: Convert ht set_affinity to new chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	0e09ddf2d7	x86: Cleanup hpet affinity setting Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	fe52b2d259	x86: Convert dmar affinity setting to new chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org>	2010-10-12 16:53:39 +02:00
Thomas Gleixner	b5d1c46579	x86: Convert remapped msi to new chip.irq_set_affinity function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	f19f5ecc92	x86: Convert remapped ioapic affinity setting to new irq chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	5346b2a78f	x86: Convert msi affinity setting to new chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	f7e909eae4	x86: Prepare the affinity common functions for taking struct irq_data * While at it rename it to sensible function names and fix the return value from unsigned to int for __ioapic_set_affinity (set_desc_affinity). Returning -1 in a function returning unsigned int is somewhat strange. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	60c69948e5	x86: ioapic: Clean up the direct access to irq_desc Most of it is useless pseudo optimization. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:38 +02:00
Thomas Gleixner	e9f7ac664b	ht: Convert to new irq_chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	5c2837fbaa	dmar: Convert to new irq chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <dwmw2@infradead.org>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	d0fbca8f93	x86: ioapic/hpet: Convert to new chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	90297c5fe7	x86: ioapic: Convert mask to new irq_chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	61a38ce3f5	x86: io_apic: Convert startup to new irq_chip function Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:37 +02:00
Thomas Gleixner	dd5f15e5cf	x86: Cleanup io_apic Sanitize functions. Remove irq_desc pointer magic. Preparatory patch for further cleanups. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:36 +02:00
Thomas Gleixner	d4eba29770	x86: Cleanup access to irq_data Fixup the open coded access to irq_desc->[handler_data\|chip_data\|msi-desc] Use the macros and inline functions for it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:36 +02:00
Thomas Gleixner	4305df947c	x86: i8259: Convert to new irq_chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:36 +02:00
Thomas Gleixner	020dd984d7	x86: Cleanup visws interrupt handling Remove the open coded access to irq_desc and convert to the new irq chip functions. Change the mask function of piix4_virtual_irq_type so we can use the generic irq handling function for the virtual interrupt instead of open coding it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	fe25c7fc2e	x86: lguest: Convert to new irq chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	a5ef2e7040	x86: Sanitize apb timer interrupt handling Disable the interrupt in CPU_DEAD where it belongs. Remove the open coded irq_desc manipulation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	a3c08e5d80	x86: Convert irq_chip access to new functions Before moving the irq chips to the new functions, fixup direct callers. The cpu offline irq fixup code needs to become generic and archs need to honour the "force" flag as an indicator, but that's for later. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:35 +02:00
Thomas Gleixner	011d578fda	x86: Remove useless reinitialization of irq descriptors The descriptors are already initialized in exactly this way. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:53:34 +02:00
Thomas Gleixner	39431acb1a	pci: Cleanup the irq_desc mess in msi Handing down irq_desc to msi just so that msi can access irq_desc.irq_data.msi_desc is a pretty stupid idea. The calling code can hand down a pointer to msi_desc so msi code does not need to know about the irq descriptor at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>	2010-10-12 16:53:34 +02:00
Thomas Gleixner	1c9db52534	pci: Convert msi to new irq_chip functions Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <linux@arm.linux.org.uk>	2010-10-12 16:53:34 +02:00
Thomas Gleixner	7c5f13519a	Merge branch 'x86/urgent' of into irq/sparseirq Reason: Pull in the latest io_apic bugfixes Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-12 16:41:26 +02:00
Thomas Gleixner	5e62feabcc	Merge branch 'x86/cleanups' into irq/sparseirq Reason: Avoid conflicts with removal of boot_cpu_id Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-12 16:40:42 +02:00
Thomas Gleixner	8ffcfa4e2d	Merge branch 'x86/x2apic' into irq/sparseirq Reason: Avoid conflicts with the x2apic modifications Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-10-12 16:39:53 +02:00
Thomas Gleixner	b683de2b3c	genirq: Query arch for number of early descriptors sparse irq sets up NR_IRQS_LEGACY irq descriptors and archs then go ahead and allocate more. Use the unused return value of arch_probe_nr_irqs() to let the architecture return the number of early allocations. Fix up all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-12 16:39:08 +02:00
Michal Marek	239060b93b	Merge branch 'kbuild/rc-fixes' into kbuild/kconfig We need to revert the temporary hack in `71ebc01`, hence the merge.	2010-10-12 15:09:06 +02:00
Zhang Rui	dab5fff14d	acpi-cpufreq: fix a memleak when unloading driver We didn't free per_cpu(acfreq_data, cpu)->freq_table when acpi_freq driver is unloaded. Resulting in the following messages in /sys/kernel/debug/kmemleak: unreferenced object 0xf6450e80 (size 64): comm "modprobe", pid 1066, jiffies 4294677317 (age 19290.453s) hex dump (first 32 bytes): 00 00 00 00 e8 a2 24 00 01 00 00 00 00 9f 24 00 ......$.......$. 02 00 00 00 00 6a 18 00 03 00 00 00 00 35 0c 00 .....j.......5.. backtrace: [<c123ba97>] kmemleak_alloc+0x27/0x50 [<c109f96f>] __kmalloc+0xcf/0x110 [<f9da97ee>] acpi_cpufreq_cpu_init+0x1ee/0x4e4 [acpi_cpufreq] [<c11cd8d2>] cpufreq_add_dev+0x142/0x3a0 [<c11920b7>] sysdev_driver_register+0x97/0x110 [<c11cce56>] cpufreq_register_driver+0x86/0x140 [<f9dad080>] 0xf9dad080 [<c1001130>] do_one_initcall+0x30/0x160 [<c10626e9>] sys_init_module+0x99/0x1e0 [<c1002d97>] sysenter_do_call+0x12/0x26 [<ffffffff>] 0xffffffff https://bugzilla.kernel.org/show_bug.cgi?id=15807#c21 Tested-by: Toralf Forster <toralf.foerster@gmx.de> Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2010-10-12 00:58:28 -04:00
H. Peter Anvin	8e4029ee35	Merge branch 'x86/urgent' into core/memblock Reason for merge: Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree. Resolved Conflicts: arch/x86/mm/srat_64.c Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 17:05:11 -07:00
Nikanth Karthikesan	50f2d7f682	x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA commit `d9c2d5ac6a` "x86, numa: Use near(er) online node instead of roundrobin for NUMA" changed NUMA initialization on Intel to choose the nearest online node or first node. Fake NUMA would be better of with round-robin initialization, instead of the all CPUS on first node. Change the choice of first node, back to round-robin. For testing NUMA kernel behaviour without cpusets and NUMA aware applications, it would be better to have cpus in different nodes, rather than all in a single node. With cpusets migration of tasks scenarios cannot not be tested. I guess having it round-robin shouldn't affect the use cases for all cpus on the first node. The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to be the case, which was changed by commit `d9c2d5ac6`. It changed from roundrobin to nearer or first node. And I couldn't find any reason for this change in its changelog. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2010-10-11 16:16:56 -07:00
Jeremy Fitzhardinge	236260b90d	memblock: Allow memblock_init to be called early The Xen setup code needs to call memblock_x86_reserve_range() very early, so allow it to initialize the memblock subsystem before doing so. The second memblock_init() is ignored. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> LKML-Reference: <4CACFDAD.3090900@goop.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 15:59:01 -07:00
Yinghai Lu	73cf624d02	x86, numa: For each node, register the memory blocks actually used Russ reported SGI UV is broken recently. He said: \| The SRAT table shows that memory range is spread over two nodes. \| \| SRAT: Node 0 PXM 0 100000000-800000000 \| SRAT: Node 1 PXM 1 800000000-1000000000 \| SRAT: Node 0 PXM 0 1000000000-1080000000 \| \|Previously, the kernel early_node_map[] would show three entries \|with the proper node. \| \|[ 0.000000] 0: 0x00100000 -> 0x00800000 \|[ 0.000000] 1: 0x00800000 -> 0x01000000 \|[ 0.000000] 0: 0x01000000 -> 0x01080000 \| \|The problem is recent community kernel early_node_map[] shows \|only two entries with the node 0 entry overlapping the node 1 \|entry. \| \| 0: 0x00100000 -> 0x01080000 \| 1: 0x00800000 -> 0x01000000 After looking at the changelog, Found out that it has been broken for a while by following commit \|commit `8716273cae` \|Author: David Rientjes <rientjes@google.com> \|Date: Fri Sep 25 15:20:04 2009 -0700 \| \| x86: Export srat physical topology Before that commit, register_active_regions() is called for every SRAT memory entry right away. Use nodememblk_range[] instead of nodes[] in order to make sure we capture the actual memory blocks registered with each node. nodes[] contains an extended range which spans all memory regions associated with a node, but that does not mean that all the memory in between are included. Reported-by: Russ Anderson <rja@sgi.com> Tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB27BDF.5000800@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> 2.6.33 .34 .35 .36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 15:26:15 -07:00
Zachary Amsden	47008cd887	KVM: x86: Move TSC reset out of vmcb_init The VMCB is reset whenever we receive a startup IPI, so Linux is setting TSC back to zero happens very late in the boot process and destabilizing the TSC. Instead, just set TSC to zero once at VCPU creation time. Why the separate patch? So git-bisect is your friend. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-11 12:36:07 +02:00
Zachary Amsden	58877679fd	KVM: x86: Fix SVM VMCB reset On reset, VMCB TSC should be set to zero. Instead, code was setting tsc_offset to zero, which passes through the underlying TSC. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>	2010-10-11 12:36:07 +02:00
Borislav Petkov	6dcbfe4f0b	x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order This fixes possible cases of not collecting valid error info in the MCE error thresholding groups on F10h hardware. The current code contains a subtle problem of checking only the Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM thresholding group) in its first iteration and breaking out if the bit is cleared. But (!), this MSR contains an offset value, BlkPtr[31:24], which points to the remaining MSRs in this thresholding group which might contain valid information too. But if we bail out only after we checked the valid bit in the first MSR and not the block pointer too, we miss that other information. The thing is, MC4_MISC0[BlkPtr] is not predicated on MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked prior to iterating over the MCI_MISCj thresholding group, irrespective of the MC4_MISC0[Valid] setting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-11 11:04:36 +02:00
Akinobu Mita	708ff2a009	bitops: make asm-generic/bitops/find.h more generic asm-generic/bitops/find.h has the extern declarations of find_next_bit() and find_next_zero_bit() and the macro definitions of find_first_bit() and find_first_zero_bit(). It is only usable by the architectures which enables CONFIG_GENERIC_FIND_NEXT_BIT and disables CONFIG_GENERIC_FIND_FIRST_BIT. x86 and tile enable both CONFIG_GENERIC_FIND_NEXT_BIT and CONFIG_GENERIC_FIND_FIRST_BIT. These architectures cannot include asm-generic/bitops/find.h in their asm/bitops.h. So ifdefed extern declarations of find_first_bit and find_first_zero_bit() are put in linux/bitops.h. This makes asm-generic/bitops/find.h usable by these architectures and use it. Also this change is needed for the forthcoming duplicated extern declarations cleanup. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Chris Metcalf <cmetcalf@tilera.com>	2010-10-09 21:51:44 +02:00
Konrad Rzeszutek Wilk	6e96366933	x86, iommu: Update header comments with appropriate naming The header comments diverged a bit from the implementation. Lets re-sync them. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> LKML-Reference: <1286564028-2352-3-git-send-email-konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-08 13:11:21 -07:00
Ingo Molnar	7cd2541cf2	Merge commit 'v2.6.36-rc7' into perf/core Conflicts: arch/x86/kernel/module.c Merge reason: Resolve the conflict, pick up fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:46:27 +02:00
Jin Dongming	b62be8ea9d	x86, mce, therm_throt.c: Fix missing curly braces in error handling logic When the feature PTS is not supported by CPU, the sysfile package_power_limit_count for package should not be generated. This patch is used for fixing missing { and }. The patch is not complete as there are other error handling problems in this function - but that can wait until the merge window. Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Reviewed-by: Fenghua Yu <fenghua.yu@initel.com> Acked-by: Jean Delvare <khali@linux-fr.org> Cc: Brown Len <len.brown@intel.com> Cc: Guenter Roeck <guenter.roeck@ericsson.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org> LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:29:20 +02:00
Paul Fox	286e5b97eb	x86, olpc: Don't retry EC commands forever Avoids a potential infinite loop. It was observed once, during an EC hacking/debugging session - not in regular operation. Signed-off-by: Daniel Drake <dsd@laptop.org> Cc: dilinger@queued.net Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:06:09 +02:00
Feng Tang	4d033556f1	x86, earlyprintk: Add hsu early console for Intel Medfield platform Intel Medfield platform has a high speed UART device, which could act as a early console. To enable early printk of HSU console, simply add "earlyprintk=hsu" in kernel command line. Currently we put the code in the early_printk_mrst.c as it is also for Intel MID platforms like the mrst early console Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Alan Cox <alan@linux.intel.com> Cc: greg@kroah.com LKML-Reference: <1284361736-23011-5-git-send-email-feng.tang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:01:47 +02:00
Feng Tang	c20b5c3318	x86, earlyprintk: Add earlyprintk for Intel Moorestown platform Intel Moorestown platform has a spi-uart device(Maxim3110), which connects to a Designware spi core controller. This patch will add early console function based on it. As it will be used long before Linux spi subsystem get initialised, we simply directly manipulate the spi controller's register to acheive the early console func. This is safe as it will be disabled when devices subsytem get initialised. To use it, user need enable CONFIG_X86_MRST_EARLY_PRINTK in kenrel config and add "earlyprintk=mrst" in kernel command line. Signed-off-by: Feng Tang <feng.tang@intel.com> Acked-by: Alan Cox <alan@linux.intel.com> Cc: greg@kroah.com LKML-Reference: <1284361736-23011-4-git-send-email-feng.tang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:01:47 +02:00
Feng Tang	5a47c7dae8	x86: Add two helper macros for fixed address mapping Sometimes fixmap will be used to map an physical address which is not PAGE align, so to use it we need first map it and then add the address offset to the mapped fixed address. These 2 new helpers are suggested by Ingo Molnar to make the process simpler. For a physicall address like "phys", a directly usable virtual address can be get by virt = (void )set_fixmap_offset(fixed_idx, phys); or virt = (void )set_fixmap_offset_nocache(fixed_idx, phys); (depends on whether the physical address is cachable or not). Signed-off-by: Feng Tang <feng.tang@intel.com> Cc: alan@linux.intel.com Cc: greg@kroah.com Cc: x86@kernel.org LKML-Reference: <1284361736-23011-3-git-send-email-feng.tang@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 10:01:46 +02:00
Andi Kleen	f672b49b07	x86: HWPOISON: Report correct address granuality for huge hwpoison faults An earlier patch fixed the hwpoison fault handling to encode the huge page size in the fault code of the page fault handler. This is needed to report this information in SIGBUS to user space. This is a straight forward patch to pass this information through to the signal handling in the x86 specific fault.c Cc: x86@kernel.org Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: fengguang.wu@intel.com Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:46 +02:00
Ingo Molnar	153db80f8c	Merge commit 'v2.6.36-rc7' into core/memblock Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 09:15:00 +02:00
Zhao Yakui	68f4d5a00a	x86, setup: Use string copy operation to optimze copy in kernel compression The kernel decompression code parses the ELF header and then copies the segment to the corresponding destination. Currently it uses slow byte-copy code. This patch makes it use the string copy operations instead. In the test the copy performance can be improved very significantly after using the string copy operation mechanism. 1. The copy time can be reduced from 150ms to 20ms on one Atom machine 2. The copy time can be reduced about 80% on another machine The time is reduced from 7ms to 1.5ms when using 32-bit kernel. The time is reduced from 10ms to 2ms when using 64-bit kernel. Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> LKML-Reference: <1286502453-7043-1-git-send-email-yakui.zhao@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-07 21:23:09 -07:00
H. Peter Anvin	55572b293b	x86, mrst: A function in a header file needs to be marked "inline" A function in a header file needs to be explicitly marked "inline", or gcc will complain if it is not used. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: <stable@kernel.org> v2.6.36 LKML-Reference: <1274295685-6774-3-git-send-email-jacob.jun.pan@linux.intel.com>	2010-10-07 16:45:18 -07:00
Namhyung Kim	a416e9e1dd	x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation On 32-bit non-PAE system, cast to 'phys_addr_t' truncates value before subtraction. Subtracting before cast produce same result but remove following warnings from sparse: arch/x86/include/asm/pgtable_types.h:255:38: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable_types.h:270:38: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable.h:127:32: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable.h:132:32: warning: cast truncates bits from constant value (100000000 becomes 0) arch/x86/include/asm/pgtable.h:344:31: warning: cast truncates bits from constant value (100000000 becomes 0) 64-bit or PAE machines will not be affected by this change. Signed-off-by: Namhyung Kim <namhyung@gmail.com> LKML-Reference: <1285770588-14065-1-git-send-email-namhyung@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-07 16:36:17 -07:00
David Howells	df9ee29270	Fix IRQ flag handling naming Fix the IRQ flag handling naming. In linux/irqflags.h under one configuration, it maps: local_irq_enable() -> raw_local_irq_enable() local_irq_disable() -> raw_local_irq_disable() local_irq_save() -> raw_local_irq_save() ... and under the other configuration, it maps: raw_local_irq_enable() -> local_irq_enable() raw_local_irq_disable() -> local_irq_disable() raw_local_irq_save() -> local_irq_save() ... This is quite confusing. There should be one set of names expected of the arch, and this should be wrapped to give another set of names that are expected by users of this facility. Change this to have the arch provide: flags = arch_local_save_flags() flags = arch_local_irq_save() arch_local_irq_restore(flags) arch_local_irq_disable() arch_local_irq_enable() arch_irqs_disabled_flags(flags) arch_irqs_disabled() arch_safe_halt() Then linux/irqflags.h wraps these to provide: raw_local_save_flags(flags) raw_local_irq_save(flags) raw_local_irq_restore(flags) raw_local_irq_disable() raw_local_irq_enable() raw_irqs_disabled_flags(flags) raw_irqs_disabled() raw_safe_halt() with type checking on the flags 'arguments', and then wraps those to provide: local_save_flags(flags) local_irq_save(flags) local_irq_restore(flags) local_irq_disable() local_irq_enable() irqs_disabled_flags(flags) irqs_disabled() safe_halt() with tracing included if enabled. The arch functions can now all be inline functions rather than some of them having to be macros. Signed-off-by: David Howells <dhowells@redhat.com> [X86, FRV, MN10300] Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> [Tile] Signed-off-by: Michal Simek <monstr@monstr.eu> [Microblaze] Tested-by: Catalin Marinas <catalin.marinas@arm.com> [ARM] Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com> [AVR] Acked-by: Tony Luck <tony.luck@intel.com> [IA-64] Acked-by: Hirokazu Takata <takata@linux-m32r.org> [M32R] Acked-by: Greg Ungerer <gerg@uclinux.org> [M68K/M68KNOMMU] Acked-by: Ralf Baechle <ralf@linux-mips.org> [MIPS] Acked-by: Kyle McMartin <kyle@mcmartin.ca> [PA-RISC] Acked-by: Paul Mackerras <paulus@samba.org> [PowerPC] Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [S390] Acked-by: Chen Liqin <liqin.chen@sunplusct.com> [Score] Acked-by: Matt Fleming <matt@console-pimps.org> [SH] Acked-by: David S. Miller <davem@davemloft.net> [Sparc] Acked-by: Chris Zankel <chris@zankel.net> [Xtensa] Reviewed-by: Richard Henderson <rth@twiddle.net> [Alpha] Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp> [H8300] Cc: starvik@axis.com [CRIS] Cc: jesper.nilsson@axis.com [CRIS] Cc: linux-cris-kernel@axis.com	2010-10-07 14:08:55 +01:00
Linus Torvalds	34984f54b7	Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm * 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm: xen: do not initialize PV timers on HVM if !xen_have_vector_callback xen: do not set xenstored_ready before xenbus_probe on hvm	2010-10-06 09:51:28 -07:00
Jeremy Fitzhardinge	161b0275e2	x86, mm: Add RESERVE_BRK_ARRAY() helper This is useful when converting static arrays into boot-time brk allocated objects. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4C805EEA.1080205@goop.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 22:16:54 -07:00
Yinghai Lu	16c36f743b	x86, memblock: Remove __memblock_x86_find_in_range_size() Fold it into memblock_x86_find_in_range(), and change bad_addr_size() to check_reserve_memblock(). So whole memblock_x86_find_in_range_size() code is more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CAA4DEC.4000401@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:45:43 -07:00
Yinghai Lu	1d931264af	x86-32, memblock: Make add_highpages honor early reserved ranges Originally the only early reserved range that is overlapped with high pages is "KVA RAM", but we already do remove that from the active ranges. However, It turns out Xen could have that kind of overlapping to support memory ballooning.x So we need to make add_highpage_with_active_regions() to subtract memblock reserved just like low ram; this is the proper design anyway. In this patch, refactering get_freel_all_memory_range() to make it can be used by add_highpage_with_active_regions(). Also we don't need to remove "KVA RAM" from active ranges. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CABB183.1040607@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:44:35 -07:00
Yinghai Lu	9f4c13964b	x86, memblock: Fix crashkernel allocation Cai Qian found crashkernel is broken with the x86 memblock changes. 1. crashkernel=128M@32M always reported that range is used, even if the first kernel is small and does not usethat range 2. we always got following report when using "kexec -p" Could not find a free area of memory of a000 bytes... locate_hole failed The root cause is that generic memblock_find_in_range() will try to allocate from the top of the range, whereas the kexec code was written assuming that allocation was always near the bottom and that it could blindly extend memory upward. Unfortunately the kexec code doesn't have a system for requesting the range that it really needs, so this is subject to probabilistic failures. This patch hacks around the problem by limiting the target range heuristically to below the traditional bzImage max range. This number is arbitrary and not always correct, and a much better result would be obtained by having kexec communicate this number based on the kernel header information and any appropriate command line options. Reported-and-Bisected-by: CAI Qian <caiqian@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CABAF2A.5090501@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:43:14 -07:00
Linus Torvalds	39c12be86a	Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf trace scripting: Fix extern struct definitions perf ui hist browser: Fix segfault on 'a' for annotate perf tools: Fix build breakage perf, x86: Handle in flight NMIs on P4 platform oprofile, ARM: Release resources on failure oprofile: Add Support for Intel CPU Family 6 / Model 29	2010-10-05 11:57:37 -07:00
Linus Torvalds	5336377d62	modules: Fix module_bug_list list corruption race With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-05 11:29:27 -07:00
Stefano Stabellini	31e7e931cd	xen: do not initialize PV timers on HVM if !xen_have_vector_callback if !xen_have_vector_callback do not initialize PV timer unconditionally because we still don't know how many cpus are available and if there is more than one we won't be able to receive the timer interrupts on cpu > 0. This patch fixes an hang at boot when Xen does not support vector callbacks and the guest has multiple vcpus. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>	2010-10-05 13:39:23 +01:00
Andi Kleen	c62f981f93	perf, gcc-4.6: Fix set but unused variable Just dead code I believe. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: andi@firstfloor.org Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-05 09:48:07 +02:00
Ingo Molnar	00e8976200	Merge branch 'perf/urgent' into perf/core Conflicts: tools/perf/util/ui/browsers/hists.c Merge reason: fix the conflict and merge in changes for dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-05 09:47:14 +02:00
Borislav Petkov	366d4a43b1	x86, cpu: Fix X86_FEATURE_NOPL `ba0593bf55` cleared the aforementioned cpuid bit only on 32-bit due to various problems with Virtual PC. This somehow got lost during the 32- + 64-bit merge so restore the feature bit on 64-bit. For that, set it explicitly for non-constant arguments of cpu_has(). Update comment for future reference. Signed-off-by: Borislav Petkov <bp@alien8.de> LKML-Reference: <20101004073127.GA20305@liondog.tnic> Cc: Ryan O'Neill <ryan@innosecc.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-04 11:22:24 -07:00
Linus Torvalds	5a4bbd01c8	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq: [CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc [CPUFREQ] acpi-cpufreq: add missing __percpu markup	2010-10-04 11:14:21 -07:00
Thomas Gleixner	3bb9808e99	x86: Use genirq Kconfig Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <20100927121843.314600915@linutronix.de> Reviewed-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Ingo Molnar <mingo@elte.hu>	2010-10-04 11:01:15 +02:00
Andreas Herrmann	5c80cc78de	x86, amd_nb: Enable GART support for AMD family 0x15 CPUs AMD CPU family 0x15 still supports GART for compatibility reasons. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930124316.GG20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	d4fbe4f035	x86, amd: Use compute unit information to determine thread siblings This information is vital for different load balancing policies. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930124156.GF20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	6057b4d331	x86, amd: Extract compute unit information for AMD CPUs Get compute unit information from CPUID Fn8000_001E_EBX. (See AMD CPUID Specification - publication # 25481, revision 2.34, September 2010.) Note that each core on a compute unit still has a core_id of its own. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123857.GE20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	23588c38a8	x86, amd: Add support for CPUID topology extension of AMD CPUs Node information (ID, number of internal nodes) is provided via CPUID Fn8000_001e_ECX. See AMD CPUID Specification (Publication # 25481, Revision 2.34, September 2010). Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123628.GD20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	420b13b60a	x86, nmi: Support NMI watchdog on newer AMD CPU families CPU families 0x12, 0x14 and 0x15 support this functionality. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123357.GC20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:32 -07:00
Andreas Herrmann	3fdbf004c1	x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs Instead of adapting the CPU family check in amd_special_default_mtrr() for each new CPU family assume that all new AMD CPUs support the necessary bits in SYS_CFG MSR. Tom2Enabled is architectural (defined in APM Vol.2). Tom2ForceMemTypeWB is defined in all BKDGs starting with K8 NPT. In pre K8-NPT BKDG this bit is reserved (read as zero). W/o this adaption Linux would unnecessarily complain about bad MTRR settings on every new AMD CPU family, e.g. [ 0.000000] WARNING: BIOS bug: CPU MTRRs don't cover all of memory, losing 4863MB of RAM. Cc: stable@kernel.org # .32.x, .35.x Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100930123235.GB20545@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-01 16:18:31 -07:00
H. Peter Anvin	86ffb08519	Merge remote branch 'origin/x86/cpu' into x86/amd-nb	2010-10-01 16:18:11 -07:00
Linus Torvalds	f4a3330d76	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, hpet: Fix bogus error check in hpet_assign_irq() x86, irq: Plug memory leak in sparse irq x86, cpu: After uncapping CPUID, re-run CPU feature detection	2010-10-01 15:02:41 -07:00
Robert Richter	5140434d5f	oprofile, x86: Simplify init/exit functions Now, that we only call the exit function if init succeeds with commit: `979048e` oprofile: don't call arch exit code from init code on failure we can simplify the x86 init/exit functions too. Variable using_nmi becomes obsolete. Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-01 17:05:47 +02:00
Jiri Olsa	f6dedecc37	oprofile, x86: Adding backtrace dump for 32bit process in compat mode This patch implements the oprofile backtrace generation for 32 bit applications running in the 64bit environment (compat mode). With this change it's possible to get backtrace for 32bits applications under the 64bits environment using oprofile's callgraph options. opcontrol --setup -c ... opreport -l -cg ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-01 16:07:18 +02:00
Jiri Olsa	40c6b3cb64	oprofile, x86: Using struct stack_frame for 64bit processes dump Removing unnecessary struct frame_head and replacing it with struct stack_frame. The struct stack_frame is already defined and used in other places in kernel, so there's no reason to define new structure. Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Robert Richter <robert.richter@amd.com>	2010-10-01 16:07:09 +02:00
Thomas Gleixner	0219896228	x86, hpet: Fix bogus error check in hpet_assign_irq() create_irq() returns -1 if the interrupt allocation failed, but the code checks for irq == 0. Use create_irq_nr() instead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <alpine.LFD.2.00.1009282310360.2416@localhost6.localdomain6> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-30 15:57:35 -07:00
Thomas Gleixner	1cf180c94e	x86, irq: Plug memory leak in sparse irq free_irq_cfg() is not freeing the cpumask_vars in irq_cfg. Fixing this triggers a use after free caused by the fact that copying struct irq_cfg is done with memcpy, which copies the pointer not the cpumask. Fix both places. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <alpine.LFD.2.00.1009282052570.2416@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-30 15:57:35 -07:00
Pekka Enberg	3682930623	[CPUFREQ] Fix memory leaks in pcc_cpufreq_do_osc If acpi_evaluate_object() function call doesn't fail, we must kfree() output.buffer before returning from pcc_cpufreq_do_osc(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Dave Jones <davej@redhat.com>	2010-09-30 16:14:23 -04:00
Namhyung Kim	86cf147494	[CPUFREQ] acpi-cpufreq: add missing __percpu markup acpi_perf_data is a percpu pointer but was missing __percpu markup. Add it. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dave Jones <davej@redhat.com>	2010-09-30 16:14:22 -04:00
Cyrill Gorcunov	03e22198d2	perf, x86: Handle in flight NMIs on P4 platform Stephane reported we've forgot to guard the P4 platform against spurious in-flight performance IRQs. Fix it. This fixes potential spurious 'dazed and confused' NMI messages. Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: Don Zickus <dzickus@redhat.com> Cc: fweisbec@gmail.com Cc: peterz@infradead.org Cc: Robert Richter <robert.richter@amd.com> Cc: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1285815698-4298-1-git-send-email-dzickus@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-30 09:17:59 +02:00
Dan Carpenter	b365a85c68	x86, UV: Use allocated buffer in tlb_uv.c:tunables_read() The original code didn't check that the value returned from snprintf() was less than the size of the buffer. Although it didn't cause a runtime bug in this case, it makes the static checkers complain. Andrew Morton suggested a dynamically sized buffer would be cleaner. Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Cliff Wickman <cpw@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> LKML-Reference: <20100929083118.GA6376@bicker> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-30 09:11:27 +02:00
Namhyung Kim	bd126b23a2	ACPI: add missing __percpu markup in arch/x86/kernel/acpi/cstate.c cpu_cstate_entry is a percpu pointer but was missing __percpu markup. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Len Brown <len.brown@intel.com>	2010-09-28 21:38:20 -04:00
H. Peter Anvin	d900329e20	x86, cpu: After uncapping CPUID, re-run CPU feature detection After uncapping the CPUID level, we need to also re-run the CPU feature detection code. This resolves kernel bugzilla 16322. Reported-by: boris64 <bugzilla.kernel.org@boris64.net> Cc: <stable@kernel.org> v2.6.29..2.6.35 LKML-Reference: <tip-@git.kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-28 16:33:14 -07:00
Linus Torvalds	050026feae	Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Avoid 'constant_test_bit()' misoptimization due to cast to non-volatile	2010-09-27 21:19:27 -07:00
Linus Torvalds	6a6aa2b7e4	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Fix rounding-bug in __unmap_single x86/amd-iommu: Work around S3 BIOS bug x86/amd-iommu: Set iommu configuration flags in enable-loop x86, setup: Fix earlyprintk=serial,0x3f8,115200 x86, setup: Fix earlyprintk=serial,ttyS0,115200	2010-09-27 12:22:21 -07:00

... 14 15 16 17 18 ...

12946 Commits