linux

Author	SHA1	Message	Date
H. Peter Anvin	f026cfa82f	Revert "x86-64/efi: Use EFI to deal with platform wall clock" This reverts commit `bacef661ac`. This commit has been found to cause serious regressions on a number of ASUS machines at the least. We probably need to provide a 1:1 map in addition to the EFI virtual memory map in order for this to work. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reported-and-bisected-by: Jérôme Carretero <cJ-ko@zougloub.eu> Cc: Jan Beulich <jbeulich@suse.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20120805172903.5f8bb24c@zougloub.eu	2012-08-14 09:58:25 -07:00
Thomas Renninger	095adbb644	ACPI: Only count valid srat memory structures Otherwise you could run into: WARN_ON in numa_register_memblks(), because node_possible_map is zero References: https://bugzilla.novell.com/show_bug.cgi?id=757888 On this machine (ProLiant ML570 G3) the SRAT table contains: - No processor affinities - One memory affinity structure (which is set disabled) CC: Per Jessen <per@opensuse.org> CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Len Brown <len.brown@intel.com>	2012-08-03 00:15:53 -04:00
Linus Torvalds	4cb38750d4	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/mm changes from Peter Anvin: "The big change here is the patchset by Alex Shi to use INVLPG to flush only the affected pages when we only need to flush a small page range. It also removes the special INVALIDATE_TLB_VECTOR interrupts (32 vectors!) and replace it with an ordinary IPI function call." Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next to changed line) * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/tlb: Fix build warning and crash when building for !SMP x86/tlb: do flush_tlb_kernel_range by 'invlpg' x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR x86/tlb: enable tlb flush range support for x86 mm/mmu_gather: enable tlb flush range in generic mmu_gather x86/tlb: add tlb_flushall_shift knob into debugfs x86/tlb: add tlb_flushall_shift for specific CPU x86/tlb: fall back to flush all when meet a THP large page x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86/tlb_info: get last level TLB entry number of CPU x86: Add read_mostly declaration/definition to variables from smp.h x86: Define early read-mostly per-cpu macros	2012-07-26 13:17:17 -07:00
Linus Torvalds	0a2fe19ccc	Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pul x86/efi changes from Ingo Molnar: "This tree adds an EFI bootloader handover protocol, which, once supported on the bootloader side, will make bootup faster and might result in simpler bootloaders. The other change activates the EFI wall clock time accessors on x86-64 as well, instead of the legacy RTC readout." * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: Handover Protocol x86-64/efi: Use EFI to deal with platform wall clock	2012-07-26 13:13:25 -07:00
Alex Shi	effee4b9b3	x86/tlb: do flush_tlb_kernel_range by 'invlpg' This patch do flush_tlb_kernel_range by 'invlpg'. The performance pay and gain was analyzed in previous patch (x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range). In the testing: http://lkml.org/lkml/2012/6/21/10 The pay is mostly covered by long kernel path, but the gain is still quite clear, memory access in user APP can increase 30+% when kernel execute this funtion. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-10-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:14 -07:00
Alex Shi	52aec3308d	x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR There are 32 INVALIDATE_TLB_VECTOR now in kernel. That is quite big amount of vector in IDT. But it is still not enough, since modern x86 sever has more cpu number. That still causes heavy lock contention in TLB flushing. The patch using generic smp call function to replace it. That saved 32 vector number in IDT, and resolved the lock contention in TLB flushing on large system. In the NHM EX machine 4P * 8cores * HT = 64 CPUs, hackbench pthread has 3% performance increase. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-9-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:13 -07:00
Alex Shi	611ae8e3f5	x86/tlb: enable tlb flush range support for x86 Not every tlb_flush execution moment is really need to evacuate all TLB entries, like in munmap, just few 'invlpg' is better for whole process performance, since it leaves most of TLB entries for later accessing. This patch also rewrite flush_tlb_range for 2 purposes: 1, split it out to get flush_blt_mm_range function. 2, clean up to reduce line breaking, thanks for Borislav's input. My micro benchmark 'mummap' http://lkml.org/lkml/2012/5/17/59 show that the random memory access on other CPU has 0~50% speed up on a 2P * 4cores * HT NHM EP while do 'munmap'. Thanks Yongjie's testing on this patch: ------------- I used Linux 3.4-RC6 w/ and w/o his patches as Xen dom0 and guest kernel. After running two benchmarks in Xen HVM guest, I found his patches brought about 1%~3% performance gain in 'kernel build' and 'netperf' testing, though the performance gain was not very stable in 'kernel build' testing. Some detailed testing results are below. Testing Environment: Hardware: Romley-EP platform Xen version: latest upstream Linux kernel: 3.4-RC6 Guest vCPU number: 8 NIC: Intel 82599 (10GB bandwidth) In 'kernel build' testing in guest: Command line \| performance gain make -j 4 \| 3.81% make -j 8 \| 0.37% make -j 16 \| -0.52% In 'netperf' testing, we tested TCP_STREAM with default socket size 16384 byte as large packet and 64 byte as small packet. I used several clients to add networking pressure, then 'netperf' server automatically generated several threads to response them. I also used large-size packet and small-size packet in the testing. Packet size \| Thread number \| performance gain 16384 bytes \| 4 \| 0.02% 16384 bytes \| 8 \| 2.21% 16384 bytes \| 16 \| 2.04% 64 bytes \| 4 \| 1.07% 64 bytes \| 8 \| 3.31% 64 bytes \| 16 \| 0.71% Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-8-git-send-email-alex.shi@intel.com Tested-by: Ren, Yongjie <yongjie.ren@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:11 -07:00
Alex Shi	3df3212f97	x86/tlb: add tlb_flushall_shift knob into debugfs kernel will replace cr3 rewrite with invlpg when tlb_flush_entries <= active_tlb_entries / 2^tlb_flushall_factor if tlb_flushall_factor is -1, kernel won't do this replacement. User can modify its value according to specific CPU/applications. Thanks for Borislav providing the help message of CONFIG_DEBUG_TLBFLUSH. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-6-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:10 -07:00
Alex Shi	c4211f42d3	x86/tlb: add tlb_flushall_shift for specific CPU Testing show different CPU type(micro architectures and NUMA mode) has different balance points between the TLB flush all and multiple invlpg. And there also has cases the tlb flush change has no any help. This patch give a interface to let x86 vendor developers have a chance to set different shift for different CPU type. like some machine in my hands, balance points is 16 entries on Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing help. For untested machine, do a conservative optimization, same as NHM CPU. Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:10 -07:00
Alex Shi	d8dfe60d6d	x86/tlb: fall back to flush all when meet a THP large page We don't need to flush large pages by PAGE_SIZE step, that just waste time. and actually, large page don't need 'invlpg' optimizing according to our micro benchmark. So, just flush whole TLB is enough for them. The following result is tested on a 2CPU * 4cores * 2HT NHM EP machine, with THP 'always' setting. Multi-thread testing, '-t' paramter is thread number: without this patch with this patch ./mprotect -t 1 14ns 13ns ./mprotect -t 2 13ns 13ns ./mprotect -t 4 12ns 11ns ./mprotect -t 8 14ns 10ns ./mprotect -t 16 28ns 28ns ./mprotect -t 32 54ns 52ns ./mprotect -t 128 200ns 200ns Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-4-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:09 -07:00
Alex Shi	e7b52ffd45	x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range x86 has no flush_tlb_range support in instruction level. Currently the flush_tlb_range just implemented by flushing all page table. That is not the best solution for all scenarios. In fact, if we just use 'invlpg' to flush few lines from TLB, we can get the performance gain from later remain TLB lines accessing. But the 'invlpg' instruction costs much of time. Its execution time can compete with cr3 rewriting, and even a bit more on SNB CPU. So, on a 512 4KB TLB entries CPU, the balance points is at: (512 - X) * 100ns(assumed TLB refill cost) = X(TLB flush entries) * 100ns(assumed invlpg cost) Here, X is 256, that is 1/2 of 512 entries. But with the mysterious CPU pre-fetcher and page miss handler Unit, the assumed TLB refill cost is far lower then 100ns in sequential access. And 2 HT siblings in one core makes the memory access more faster if they are accessing the same memory. So, in the patch, I just do the change when the target entries is less than 1/16 of whole active tlb entries. Actually, I have no data support for the percentage '1/16', so any suggestions are welcomed. As to hugetlb, guess due to smaller page table, and smaller active TLB entries, I didn't see benefit via my benchmark, so no optimizing now. My micro benchmark show in ideal scenarios, the performance improves 70 percent in reading. And in worst scenario, the reading/writing performance is similar with unpatched 3.4-rc4 kernel. Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP 'always': multi thread testing, '-t' paramter is thread number: with patch unpatched 3.4-rc4 ./mprotect -t 1 14ns 24ns ./mprotect -t 2 13ns 22ns ./mprotect -t 4 12ns 19ns ./mprotect -t 8 14ns 16ns ./mprotect -t 16 28ns 26ns ./mprotect -t 32 54ns 51ns ./mprotect -t 128 200ns 199ns Single process with sequencial flushing and memory accessing: with patch unpatched 3.4-rc4 ./mprotect 7ns 11ns ./mprotect -p 4096 -l 8 -n 10240 21ns 21ns [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com has additional performance numbers. ] Signed-off-by: Alex Shi <alex.shi@intel.com> Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-06-27 19:29:07 -07:00
Jan Beulich	0d26d1d873	x86/mm: Mark free_initrd_mem() as __init ... matching various other architectures. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4FDF1F5C020000780008A661@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-20 14:33:47 +02:00
Wanpeng Li	9efc31b81d	x86/mm: Fix some kernel-doc warnings Fix kernel-doc warnings in arch/x86/mm/ioremap.c and arch/x86/mm/pageattr.c, just like this one: Warning(arch/x86/mm/ioremap.c:204): No description found for parameter 'phys_addr' Warning(arch/x86/mm/ioremap.c:204): Excess function parameter 'offset' description in 'ioremap_nocache' Signed-off-by: Wanpeng Li <liwp@linux.vnet.ibm.com> Cc: Gavin Shan <shangw@linux.vnet.ibm.com> Cc: Wanpeng Li <liwp.linux@gmail.com> Link: http://lkml.kernel.org/r/1339296652-2935-1-git-send-email-liwp.linux@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-11 10:54:45 +02:00
Yinghai Lu	bd2753b2dd	x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space Robin found this regression: \| I just tried to boot an 8TB system. It fails very early in boot with: \| Kernel panic - not syncing: Cannot find space for the kernel page tables git bisect commit `722bc6b167`. A git revert of that commit does boot past that point on the 8TB configuration. That commit will add up extra pages for all memory range even above 4g. Try to limit that extra page count adding to first entry only. Bisected-by: Robin Holt <holt@sgi.com> Tested-by: Robin Holt <holt@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/CAE9FiQUj3wyzQxtq9yzBNc9u220p8JZ1FYHG7t%3DMOzJ%3D9BZMYA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-08 11:40:50 +02:00
Yasuaki Ishimatsu	4af463d28f	x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init() When hot-adding a CPU, the system outputs following messages since node_to_cpumask_map[2] was not allocated memory. Booting Node 2 Processor 32 APIC 0xc0 node_to_cpumask_map[2] NULL Pid: 0, comm: swapper/32 Tainted: G A 3.3.5-acd #21 Call Trace: [<ffffffff81048845>] debug_cpumask_set_cpu+0x155/0x160 [<ffffffff8105e28a>] ? add_timer_on+0xaa/0x120 [<ffffffff8150665f>] numa_add_cpu+0x1e/0x22 [<ffffffff815020bb>] identify_cpu+0x1df/0x1e4 [<ffffffff815020d6>] identify_econdary_cpu+0x16/0x1d [<ffffffff81504614>] smp_store_cpu_info+0x3c/0x3e [<ffffffff81505263>] smp_callin+0x139/0x1be [<ffffffff815052fb>] start_secondary+0x13/0xeb The reason is that the bit of node 2 was not set at numa_nodes_parsed. numa_nodes_parsed is set by only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init. Thus even if hot-added memory which is same PXM as hot-added CPU is written in ACPI SRAT Table, if the hot-added CPU is not written in ACPI SRAT table, numa_nodes_parsed is not set. But according to ACPI Spec Rev 5.0, it says about ACPI SRAT table as follows: This optional table provides information that allows OSPM to associate processors and memory ranges, including ranges of memory provided by hot-added memory devices, with system localities / proximity domains and clock domains. It means that ACPI SRAT table only provides information for CPUs present at boot time and for memory including hot-added memory. So hot-added memory is written in ACPI SRAT table, but hot-added CPU is not written in it. Thus numa_nodes_parsed should be set by not only acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init but also acpi_numa_memory_affinity_init for the case. Additionally, if system has cpuless memory node, acpi_numa_processor_affinity_init / acpi_numa_x2apic_affinity_init cannot set numa_nodes_parseds since these functions cannot find cpu description for the node. In this case, numa_nodes_parsed needs to be set by acpi_numa_memory_affinity_init. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: liuj97@gmail.com Cc: kosaki.motohiro@gmail.com Link: http://lkml.kernel.org/r/4FCC2098.4030007@jp.fujitsu.com [ merged it ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-06 11:58:39 +02:00
Jan Beulich	bacef661ac	x86-64/efi: Use EFI to deal with platform wall clock Other than ix86, x86-64 on EFI so far didn't set the {g,s}et_wallclock accessors to the EFI routines, thus incorrectly using raw RTC accesses instead. Simply removing the #ifdef around the respective code isn't enough, however: While so far early get-time calls were done in physical mode, this doesn't work properly for x86-64, as virtual addresses would still need to be set up for all runtime regions (which wasn't the case on the system I have access to), so instead the patch moves the call to efi_enter_virtual_mode() ahead (which in turn allows to drop all code related to calling efi-get-time in physical mode). Additionally the earlier calling of efi_set_executable() requires the CPA code to cope, i.e. during early boot it must be avoided to call cpa_flush_array(), as the first thing this function does is a BUG_ON(irqs_disabled()). Also make the two EFI functions in question here static - they're not being referenced elsewhere. Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Matt Fleming <matt.fleming@intel.com> Acked-by: Matthew Garrett <mjg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FBFBF5F020000780008637F@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-06-06 11:48:05 +02:00
H. Peter Anvin	bbd771474e	Merge branch 'x86/trampoline' into x86/urgent x86/trampoline contains an urgent commit which is necessarily on a newer baseline. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-05-30 12:11:32 -07:00
John Dykstra	fa83523f45	x86/mm/pat: Improve scaling of pat_pagerange_is_ram() Function pat_pagerange_is_ram() scales poorly to large address ranges, because it probes the resource tree for each page. On a 2.6 GHz Opteron, this function consumes 34 ms for a 1 GB range. It is called twice during untrack_pfn_vma(), slowing process cleanup and handicapping the OOM killer. This replacement consumes less than 1ms, under the same conditions. Signed-off-by: John Dykstra <jdykstra@cray.com> on behalf of Cray Inc. Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1337980366.1979.6.camel@redwood [ Small stylistic cleanups and renames ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-05-30 10:57:11 +02:00
Bjorn Helgaas	365811d6f9	x86: print physical addresses consistently with other parts of kernel Print physical address info in a style consistent with the %pR style used elsewhere in the kernel. For example: -found SMP MP-table at [ffff8800000fce90] fce90 +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90] -initial memory mapped : 0 - 20000000 +initial memory mapped: [mem 0x00000000-0x1fffffff] -Base memory trampoline at [ffff88000009c000] 9c000 size 8192 +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000] -SRAT: Node 0 PXM 0 0-80000000 +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-29 16:22:21 -07:00
Linus Torvalds	644473e9c6	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace enhancements from Eric Biederman: "This is a course correction for the user namespace, so that we can reach an inexpensive, maintainable, and reasonably complete implementation. Highlights: - Config guards make it impossible to enable the user namespace and code that has not been converted to be user namespace safe. - Use of the new kuid_t type ensures the if you somehow get past the config guards the kernel will encounter type errors if you enable user namespaces and attempt to compile in code whose permission checks have not been updated to be user namespace safe. - All uids from child user namespaces are mapped into the initial user namespace before they are processed. Removing the need to add an additional check to see if the user namespace of the compared uids remains the same. - With the user namespaces compiled out the performance is as good or better than it is today. - For most operations absolutely nothing changes performance or operationally with the user namespace enabled. - The worst case performance I could come up with was timing 1 billion cache cold stat operations with the user namespace code enabled. This went from 156s to 164s on my laptop (or 156ns to 164ns per stat operation). - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value. Most uid/gid setting system calls treat these value specially anyway so attempting to use -1 as a uid would likely cause entertaining failures in userspace. - If setuid is called with a uid that can not be mapped setuid fails. I have looked at sendmail, login, ssh and every other program I could think of that would call setuid and they all check for and handle the case where setuid fails. - If stat or a similar system call is called from a context in which we can not map a uid we lie and return overflowuid. The LFS experience suggests not lying and returning an error code might be better, but the historical precedent with uids is different and I can not think of anything that would break by lying about a uid we can't map. - Capabilities are localized to the current user namespace making it safe to give the initial user in a user namespace all capabilities. My git tree covers all of the modifications needed to convert the core kernel and enough changes to make a system bootable to runlevel 1." Fix up trivial conflicts due to nearby independent changes in fs/stat.c * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits) userns: Silence silly gcc warning. cred: use correct cred accessor with regards to rcu read lock userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq userns: Convert cgroup permission checks to use uid_eq userns: Convert tmpfs to use kuid and kgid where appropriate userns: Convert sysfs to use kgid/kuid where appropriate userns: Convert sysctl permission checks to use kuid and kgids. userns: Convert proc to use kuid/kgid where appropriate userns: Convert ext4 to user kuid/kgid where appropriate userns: Convert ext3 to use kuid/kgid where appropriate userns: Convert ext2 to use kuid/kgid where appropriate. userns: Convert devpts to use kuid/kgid where appropriate userns: Convert binary formats to use kuid/kgid where appropriate userns: Add negative depends on entries to avoid building code that is userns unsafe userns: signal remove unnecessary map_cred_ns userns: Teach inode_capable to understand inodes whose uids map to other namespaces. userns: Fail exec for suid and sgid binaries with ids outside our user namespace. userns: Convert stat to return values mapped from kuids and kgids userns: Convert user specfied uids and gids in chown into kuids and kgid userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs ...	2012-05-23 17:42:39 -07:00
Linus Torvalds	02171b4a7c	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "This tree includes a micro-optimization that avoids cr3 switches during idling; it fixes corner cases and there's also small cleanups" Fix up trivial context conflict with the percpu_xx -> this_cpu_xx changes. * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Fix accounting in kernel_physical_mapping_init() x86/tlb: Clean up and unify TLB_FLUSH_ALL definition x86: Drop obsolete ARCH_BOOTMEM support x86, tlb: Switch cr3 in leave_mm() only when needed x86/mm: Fix the size calculation of mapping tables	2012-05-23 11:06:59 -07:00
Linus Torvalds	269af9a1a0	Merge branch 'x86-extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull exception table generation updates from Ingo Molnar: "The biggest change here is to allow the build-time sorting of the exception table, to speed up booting. This is achieved by the architecture enabling BUILDTIME_EXTABLE_SORT. This option is enabled for x86 and MIPS currently. On x86 a number of fixes and changes were needed to allow build-time sorting of the exception table, in particular a relocation invariant exception table format was needed. This required the abstracting out of exception table protocol and the removal of 20 years of accumulated assumptions about the x86 exception table format. While at it, this tree also cleans up various other aspects of exception handling, such as early(er) exception handling for rdmsr_safe() et al. All in one, as the result of these changes the x86 exception code is now pretty nice and modern. As an added bonus any regressions in this code will be early and violent crashes, so if you see any of those, you'll know whom to blame!" Fix up trivial conflicts in arch/{mips,x86}/Kconfig files due to nearby modifications of other core architecture options. * 'x86-extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) Revert "x86, extable: Disable presorted exception table for now" scripts/sortextable: Handle relative entries, and other cleanups x86, extable: Switch to relative exception table entries x86, extable: Disable presorted exception table for now x86, extable: Add _ASM_EXTABLE_EX() macro x86, extable: Remove open-coded exception table entries in arch/x86/ia32/ia32entry.S x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/xsave.h x86, extable: Remove open-coded exception table entries in arch/x86/include/asm/kvm_host.h x86, extable: Remove the now-unused __ASM_EX_SEC macros x86, extable: Remove open-coded exception table entries in arch/x86/xen/xen-asm_32.S x86, extable: Remove open-coded exception table entries in arch/x86/um/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/usercopy_32.c x86, extable: Remove open-coded exception table entries in arch/x86/lib/putuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/getuser.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/csum-copy_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_nocache_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/copy_user_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/checksum_32.S x86, extable: Remove open-coded exception table entries in arch/x86/kernel/test_rodata.c x86, extable: Remove open-coded exception table entries in arch/x86/kernel/entry_64.S ...	2012-05-23 10:44:35 -07:00
Linus Torvalds	d79ee93de9	Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "The biggest change is the cleanup/simplification of the load-balancer: instead of the current practice of architectures twiddling scheduler internal data structures and providing the scheduler domains in colorfully inconsistent ways, we now have generic scheduler code in kernel/sched/core.c:sched_init_numa() that looks at the architecture's node_distance() parameters and (while not fully trusting it) deducts a NUMA topology from it. This inevitably changes balancing behavior - hopefully for the better. There are various smaller optimizations, cleanups and fixlets as well" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Taint kernel with TAINT_WARN after sleep-in-atomic bug sched: Remove stale power aware scheduling remnants and dysfunctional knobs sched/debug: Fix printing large integers on 32-bit platforms sched/fair: Improve the ->group_imb logic sched/nohz: Fix rq->cpu_load[] calculations sched/numa: Don't scale the imbalance sched/fair: Revert sched-domain iteration breakage sched/x86: Rewrite set_cpu_sibling_map() sched/numa: Fix the new NUMA topology bits sched/numa: Rewrite the CONFIG_NUMA sched domain support sched/fair: Propagate 'struct lb_env' usage into find_busiest_group sched/fair: Add some serialization to the sched_domain load-balance walk sched/fair: Let minimally loaded cpu balance the group sched: Change rq->nr_running to unsigned int x86/numa: Check for nonsensical topologies on real hw as well x86/numa: Hard partition cpu topology masks on node boundaries x86/numa: Allow specifying node_distance() for numa=fake x86/sched: Make mwait_usable() heed to "idle=" kernel parameters properly sched: Update documentation and comments sched_rt: Avoid unnecessary dequeue and enqueue of pushable tasks in set_cpus_allowed_rt()	2012-05-22 18:27:32 -07:00
Jan Beulich	20167d3421	x86-64: Fix accounting in kernel_physical_mapping_init() When finding a present and acceptable 2M/1G mapping, the number of pages mapped this way shouldn't be incremented (as it was already incremented when the earlier part of the mapping was established). Instead, last_map_addr needs to be updated in this case. Further, address increments were wrong in one place each in both phys_pmd_init() and phys_pud_init() (lacking the aligning down to the respective page boundary). As we're now doing the same calculation several times, fold it into a single instance using a local variable (matching how kernel_physical_mapping_init() itself does it at the PGD level). Observed during code inspection, not because of an actual problem. Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/4FB3C27202000078000841A0@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-05-18 10:13:37 +02:00
Alex Shi	c6ae41e7d4	x86: replace percpu_xxx funcs with this_cpu_xxx Since percpu_xxx() serial functions are duplicated with this_cpu_xxx(). Removing percpu_xxx() definition and replacing them by this_cpu_xxx() in code. There is no function change in this patch, just preparation for later percpu_xxx serial function removing. On x86 machine the this_cpu_xxx() serial functions are same as __this_cpu_xxx() without no unnecessary premmpt enable/disable. Thanks for Stephen Rothwell, he found and fixed a i386 build error in the patch. Also thanks for Andrew Morton, he kept updating the patchset in Linus' tree. Signed-off-by: Alex Shi <alex.shi@intel.com> Acked-by: Christoph Lameter <cl@gentwo.org> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2012-05-14 14:15:31 -07:00
Peter Zijlstra	94c0dd3278	x86/numa: Allow specifying node_distance() for numa=fake Allows emulating more interesting NUMA configurations like a quad socket AMD Magny-Cour: "numa=fake=8:10,16,16,22,16,22,16,22, 16,10,22,16,22,16,22,16, 16,22,10,16,16,22,16,22, 22,16,16,10,22,16,22,16, 16,22,16,22,10,16,16,22, 22,16,22,16,16,10,22,16, 16,22,16,22,16,22,10,16, 22,16,22,16,22,16,16,10" Which has a non-fully-connected topology. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/n/tip-e1136ef7kdffj7yf9tjhydln@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-05-09 13:28:59 +02:00
Eric W. Biederman	078de5f706	userns: Store uid and gid values in struct cred with kuid_t and kgid_t types cred.h and a few trivial users of struct cred are changed. The rest of the users of struct cred are left for other patches as there are too many changes to make in one go and leave the change reviewable. If the user namespace is disabled and CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile and behave correctly. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2012-05-03 03:28:38 -07:00
H. Peter Anvin	706276543b	x86, extable: Switch to relative exception table entries Switch to using relative exception table entries on x86. On i386, this has the advantage that the exception table entries don't need to be relocated; on x86-64 this means the exception table entries take up only half the space. In either case, a 32-bit delta is sufficient, as the range of kernel code addresses is limited. Since part of the goal is to avoid needing to adjust the entries when the kernel is relocated, the old trick of using addresses in the NULL pointer range to indicate uaccess_err no longer works (and unlike RISC architectures we can't use a flag bit); instead use an delta just below +2G to indicate these special entries. The reach is still limited to a single instruction. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: David Daney <david.daney@cavium.com> Link: http://lkml.kernel.org/r/CA%2B55aFyijf43qSu3N9nWHEBwaGbb7T2Oq9A=9EyR=Jtyqfq_cQ@mail.gmail.com	2012-04-20 17:22:34 -07:00
H. Peter Anvin	6a1ea279c2	x86, extable: Add early_fixup_exception() Add a restricted version of fixup_exception() to be used during early boot only. In particular, this doesn't support the try..catch variant since we may not have a thread_info set up yet. This relies on the exception table being sorted already at build time. Link: http://lkml.kernel.org/r/1334794610-5546-1-git-send-email-hpa@zytor.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2012-04-19 15:31:04 -07:00
Linus Torvalds	eb05df9e7e	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Peter Anvin: "The biggest textual change is the cleanup to use symbolic constants for x86 trap values. The only functional change and the reason for the x86/x32 dependency is the move of is_ia32_task() into <asm/thread_info.h> so that it can be used in other code that needs to understand if a system call comes from the compat entry point (and therefore uses i386 system call numbers) or not. One intended user for that is the BPF system call filter. Moving it out of <asm/compat.h> means we can define it unconditionally, returning always true on i386." * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move is_ia32_task to asm/thread_info.h from asm/compat.h x86: Rename trap_no to trap_nr in thread_struct x86: Use enum instead of literals for trap values	2012-03-29 18:21:35 -07:00
Linus Torvalds	6b8212a313	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Ingo Molnar. This touches some non-x86 files due to the sanitized INLINE_SPIN_UNLOCK config usage. Fixed up trivial conflicts due to just header include changes (removing headers due to cpu_idle() merge clashing with the <asm/system.h> split). * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/amd: Be more verbose about LVT offset assignments x86, tls: Off by one limit check x86/ioapic: Add io_apic_ops driver layer to allow interception x86/olpc: Add debugfs interface for EC commands x86: Merge the x86_32 and x86_64 cpu_idle() functions x86/kconfig: Remove CONFIG_TR=y from the defconfigs x86: Stop recursive fault in print_context_stack after stack overflow x86/io_apic: Move and reenable irq only when CONFIG_GENERIC_PENDING_IRQ=y x86/apic: Add separate apic_id_valid() functions for selected apic drivers locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage x86/kconfig: Update defconfigs x86: Fix excessive MSR print out when show_msr is not specified	2012-03-29 14:28:26 -07:00
Linus Torvalds	0195c00244	Disintegrate and delete asm/system.h -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIVAwUAT3NKzROxKuMESys7AQKElw/+JyDxJSlj+g+nymkx8IVVuU8CsEwNLgRk 8KEnRfLhGtkXFLSJYWO6jzGo16F8Uqli1PdMFte/wagSv0285/HZaKlkkBVHdJ/m u40oSjgT013bBh6MQ0Oaf8pFezFUiQB5zPOA9QGaLVGDLXCmgqUgd7exaD5wRIwB ZmyItjZeAVnDfk1R+ZiNYytHAi8A5wSB+eFDCIQYgyulA1Igd1UnRtx+dRKbvc/m rWQ6KWbZHIdvP1ksd8wHHkrlUD2pEeJ8glJLsZUhMm/5oMf/8RmOCvmo8rvE/qwl eDQ1h4cGYlfjobxXZMHqAN9m7Jg2bI946HZjdb7/7oCeO6VW3FwPZ/Ic75p+wp45 HXJTItufERYk6QxShiOKvA+QexnYwY0IT5oRP4DrhdVB/X9cl2MoaZHC+RbYLQy+ /5VNZKi38iK4F9AbFamS7kd0i5QszA/ZzEzKZ6VMuOp3W/fagpn4ZJT1LIA3m4A9 Q0cj24mqeyCfjysu0TMbPtaN+Yjeu1o1OFRvM8XffbZsp5bNzuTDEvviJ2NXw4vK 4qUHulhYSEWcu9YgAZXvEWDEM78FXCkg2v/CrZXH5tyc95kUkMPcgG+QZBB5wElR FaOKpiC/BuNIGEf02IZQ4nfDxE90QwnDeoYeV+FvNj9UEOopJ5z5bMPoTHxm4cCD NypQthI85pc= =G9mT -----END PGP SIGNATURE----- Merge tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system Pull "Disintegrate and delete asm/system.h" from David Howells: "Here are a bunch of patches to disintegrate asm/system.h into a set of separate bits to relieve the problem of circular inclusion dependencies. I've built all the working defconfigs from all the arches that I can and made sure that they don't break. The reason for these patches is that I recently encountered a circular dependency problem that came about when I produced some patches to optimise get_order() by rewriting it to use ilog2(). This uses bitops - and on the SH arch asm/bitops.h drags in asm-generic/get_order.h by a circuituous route involving asm/system.h. The main difficulty seems to be asm/system.h. It holds a number of low level bits with no/few dependencies that are commonly used (eg. memory barriers) and a number of bits with more dependencies that aren't used in many places (eg. switch_to()). These patches break asm/system.h up into the following core pieces: (1) asm/barrier.h Move memory barriers here. This already done for MIPS and Alpha. (2) asm/switch_to.h Move switch_to() and related stuff here. (3) asm/exec.h Move arch_align_stack() here. Other process execution related bits could perhaps go here from asm/processor.h. (4) asm/cmpxchg.h Move xchg() and cmpxchg() here as they're full word atomic ops and frequently used by atomic_xchg() and atomic_cmpxchg(). (5) asm/bug.h Move die() and related bits. (6) asm/auxvec.h Move AT_VECTOR_SIZE_ARCH here. Other arch headers are created as needed on a per-arch basis." Fixed up some conflicts from other header file cleanups and moving code around that has happened in the meantime, so David's testing is somewhat weakened by that. We'll find out anything that got broken and fix it.. * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits) Delete all instances of asm/system.h Remove all #inclusions of asm/system.h Add #includes needed to permit the removal of asm/system.h Move all declarations of free_initmem() to linux/mm.h Disintegrate asm/system.h for OpenRISC Split arch_align_stack() out from asm-generic/system.h Split the switch_to() wrapper out of asm-generic/system.h Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h Create asm-generic/barrier.h Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h Disintegrate asm/system.h for Xtensa Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt] Disintegrate asm/system.h for Tile Disintegrate asm/system.h for Sparc Disintegrate asm/system.h for SH Disintegrate asm/system.h for Score Disintegrate asm/system.h for S390 Disintegrate asm/system.h for PowerPC Disintegrate asm/system.h for PA-RISC Disintegrate asm/system.h for MN10300 ...	2012-03-28 15:58:21 -07:00
David Howells	f05e798ad4	Disintegrate asm/system.h for X86 Disintegrate asm/system.h for X86. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: H. Peter Anvin <hpa@zytor.com> cc: x86@kernel.org	2012-03-28 18:11:12 +01:00
Linus Torvalds	ed2d265d12	The following text was taken from the original review request: "[RFC - PATCH 0/7] consolidation of BUG support code." https://lkml.org/lkml/2012/1/26/525 -- The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: *** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPbNwpAAoJEOvOhAQsB9HWrqYP/A0t9VB0nK6e42F0OR2P14MZ GJFtf1B++wwioIrx+KSWSRfSur1C5FKhDbxLR3I/pvkAYl4+T4JvRdMG6xJwxyip CC1kVQQNDjWVVqzjz2x6rYkOffx6dUlw/ERyIyk+OzP+1HzRIsIrugMqbzGLlX0X y0v2Tbd0G6xg1DV8lcRdp95eIzcGuUvdb2iY2LGadWZczEOeSXx64Jz3QCFxg3aL LFU4oovsg8Nb7MRJmqDvHK/oQf5vaTm9WSrS0pvVte0msSQRn8LStYdWC0G9BPCS GwL86h/eLXlUXQlC5GpgWg1QQt5i2QpjBFcVBIG0IT5SgEPMx+gXyiqZva2KwbHu LKicjKtfnzPitQnyEV/N6JyV1fb1U6/MsB7ebU5nCCzt9Gr7MYbjZ44peNeprAtu HMvJ/BNnRr4Ha6nPQNu952AdASPKkxmeXFUwBL1zUbLkOX/bK/vy1ujlcdkFxCD7 fP3t7hghYa737IHk0ehUOhrE4H67hvxTSCKioLUAy/YeN1IcfH/iOQiCBQVLWmoS AqYV6ou9cqgdYoyila2UeAqegb+8xyubPIHt+lebcaKxs5aGsTg+r3vq5juMDAPs iwSVYUDcIw9dHer1lJfo7QCy3QUTRDTxh+LB9VlHXQICgeCK02sLBOi9hbEr4/H8 Ko9g8J3BMxcMkXLHT9ud =PYQT -----END PGP SIGNATURE----- Merge tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull <linux/bug.h> cleanup from Paul Gortmaker: "The changes shown here are to unify linux's BUG support under the one <linux/bug.h> file. Due to historical reasons, we have some BUG code in bug.h and some in kernel.h -- i.e. the support for BUILD_BUG in linux/kernel.h predates the addition of linux/bug.h, but old code in kernel.h wasn't moved to bug.h at that time. As a band-aid, kernel.h was including <asm/bug.h> to pseudo link them. This has caused confusion[1] and general yuck/WTF[2] reactions. Here is an example that violates the principle of least surprise: CC lib/string.o lib/string.c: In function 'strlcat': lib/string.c:225:2: error: implicit declaration of function 'BUILD_BUG_ON' make[2]: ** [lib/string.o] Error 1 $ $ grep linux/bug.h lib/string.c #include <linux/bug.h> $ We've included <linux/bug.h> for the BUG infrastructure and yet we still get a compile fail! [We've not kernel.h for BUILD_BUG_ON.] Ugh - very confusing for someone who is new to kernel development. With the above in mind, the goals of this changeset are: 1) find and fix any include/.h files that were relying on the implicit presence of BUG code. 2) find and fix any C files that were consuming kernel.h and hence relying on implicitly getting some/all BUG code. 3) Move the BUG related code living in kernel.h to <linux/bug.h> 4) remove the asm/bug.h from kernel.h to finally break the chain. During development, the order was more like 3-4, build-test, 1-2. But to ensure that git history for bisect doesn't get needless build failures introduced, the commits have been reorderd to fix the problem areas in advance. [1] https://lkml.org/lkml/2012/1/3/90 [2] https://lkml.org/lkml/2012/1/17/414" Fix up conflicts (new radeon file, reiserfs header cleanups) as per Paul and linux-next. tag 'bug-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: kernel.h: doesn't explicitly use bug.h, so don't include it. bug: consolidate BUILD_BUG_ON with other bug code BUG: headers with BUG/BUG_ON etc. need linux/bug.h bug.h: add include of it to various implicit C users lib: fix implicit users of kernel.h for TAINT_WARN spinlock: macroize assert_spin_locked to avoid bug.h dependency x86: relocate get/set debugreg fcns to include/asm/debugreg.	2012-03-24 10:08:39 -07:00
Steffen Persvold	b7157acf42	x86/apic: Add separate apic_id_valid() functions for selected apic drivers As suggested by Suresh Siddha and Yinghai Lu: For x2apic pre-enabled systems, apic driver is set already early through early_acpi_boot_init()/early_acpi_process_madt()/ acpi_parse_madt()/default_acpi_madt_oem_check() path so that apic_id_valid() checking will be sufficient during MADT and SRAT parsing. For non-x2apic pre-enabled systems, all apic ids should be less than 255. This allows us to substitute the checks in arch/x86/kernel/acpi/boot.c::acpi_parse_x2apic() and arch/x86/mm/srat.c::acpi_numa_x2apic_affinity_init() with apic->apic_id_valid(). In addition we can avoid feigning the x2apic cpu feature in the NumaChip apic code. The following apic drivers have separate apic_id_valid() functions which will accept x2apic type IDs : x2apic_phys x2apic_cluster x2apic_uv_x apic_numachip Signed-off-by: Steffen Persvold <sp@numascale.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Daniel J Blueman <daniel@numascale-asia.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1331925935-13372-1-git-send-email-sp@numascale.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-03-23 13:28:43 +01:00
Suresh Siddha	a6fca40f1d	x86, tlb: Switch cr3 in leave_mm() only when needed Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir. But there is no need to change the cr3, if we already left that mm. intel_idle() for example calls leave_mm() on every deep c-state entry where the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling leave_mm() whenever the TLB is in LAZY state. Both these paths will be improved with this change. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Link: http://lkml.kernel.org/r/1332460885.16101.147.camel@sbsiddha-desk.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-03-22 17:23:48 -07:00
Linus Torvalds	28f23d1f3b	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 "urgent" leftovers from Ingo Molnar: "Pending x86/urgent bits that were not high prio enough to warrant -rc-less v3.3-final inclusion." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, efi: Fix pointer math issue in handle_ramdisks() x86/ioapic: Add register level checks to detect bogus io-apic entries x86, mce: Fix rcu splat in drain_mce_log_buffer() x86, memblock: Move mem_hole_size() to .init	2012-03-22 09:44:50 -07:00
Andrea Arcangeli	d71b5a73fe	numa_emulation: fix cpumask_of_node() Without this fix the cpumask_of_node() for a fake=numa=2 is: cpumask 0 ff cpumask 1 ff with the fix it's correct and it's set to: cpumask 0 55 cpumask 1 aa Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Johannes Weiner <jweiner@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:55:00 -07:00
Xiao Guangrong	b69add218d	hugetlb: remove prev_vma from hugetlb_get_unmapped_area_topdown() After looking up the vma which covers or follows the cached search address, the following condition is always true: !prev_vma \|\| (addr >= prev_vma->vm_end) so we can stop checking the previous VMA altogether. Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:54:59 -07:00
Xiao Guangrong	cbde83e21c	hugetlb: try to search again if it is really needed Search again only if some holes may be skipped in the first pass. [akpm@linux-foundation.org: clean up crazy compound definition] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Hillf Danton <dhillf@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-21 17:54:56 -07:00
Cong Wang	a24401bcf4	highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename] Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Cong Wang <amwang@redhat.com>	2012-03-20 21:48:30 +08:00
Srikar Dronamraju	51e7dc7011	x86: Rename trap_no to trap_nr in thread_struct There are precedences of trap number being referred to as trap_nr. However thread struct refers trap number as trap_no. Change it to trap_nr. Also use enum instead of left-over literals for trap values. This is pure cleanup, no functional change intended. Suggested-by: Ingo Molnar <mingo@eltu.hu> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com> Cc: Linux-mm <linux-mm@kvack.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20120312092555.5379.942.sendpatchset@srdronam.in.ibm.com [ Fixed the math-emu build ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-03-13 06:24:09 +01:00
Linus Torvalds	55062d0617	x86: fix typo in recent find_vma_prev purge It turns out that test-compiling this file on x86-64 doesn't really help, because much of it is x86-32-specific. And so I hadn't noticed the slightly over-eager removal of the 'r' from 'addr' variable despite thinking I had tested it. Signed-off-by: Linus "oopsie" Torvalds <torvalds@linux-foundation.org>	2012-03-06 18:48:13 -08:00
Linus Torvalds	097d59106a	vm: avoid using find_vma_prev() unnecessarily Several users of "find_vma_prev()" were not in fact interested in the previous vma if there was no primary vma to be found either. And in those cases, we're much better off just using the regular "find_vma()", and then "prev" can be looked up by just checking vma->vm_prev. The find_vma_prev() semantics are fairly subtle (see Mikulas' recent commit `83cd904d27`: "mm: fix find_vma_prev"), and the whole "return prev by reference" means that it generates worse code too. Thus this "let's avoid using this inconvenient and clearly too subtle interface when we don't really have to" patch. Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-03-06 18:23:36 -08:00
WANG Cong	722bc6b167	x86/mm: Fix the size calculation of mapping tables For machines that enable PSE, the first 2/4M memory region still uses 4K pages, so needs more PTEs in this case, but find_early_table_space() doesn't count this. This patch fixes it. The bug was found via code review, no misbehavior of the kernel was observed. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: <ianfang.cn@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-kq6a00qe33h7c7ais2xsywnh@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-03-06 09:38:26 +01:00
Jiri Kosina	e37aade316	x86, memblock: Move mem_hole_size() to .init mem_hole_size() is being called only from __init-marked functions, and as such should be moved to .init section as well. Fixes this warning: WARNING: vmlinux.o(.text+0x35511): Section mismatch in reference from the function mem_hole_size() to the function .init.text:absent_pages_in_range() Signed-off-by: Jiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1202281614450.31150@pobox.suse.cz Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-03-03 15:51:20 -08:00
Paul Gortmaker	50af5ead3b	bug.h: add include of it to various implicit C users With bug.h currently living right in linux/kernel.h there are files that use BUG_ON and friends but are not including the header explicitly. Fix them up so we can remove the presence in kernel.h file. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2012-02-29 17:15:08 -05:00
Linus Torvalds	2f2fde9272	Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus', 'sched-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: bugs, x86: Fix printk levels for panic, softlockups and stack dumps * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf top: Fix number of samples displayed perf tools: Fix strlen() bug in perf_event__synthesize_event_type() perf tools: Fix broken build by defining _GNU_SOURCE in Makefile x86/dumpstack: Remove unneeded check in dump_trace() perf: Fix broken interrupt rate throttling * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt: Fix task stack corruption under __ARCH_WANT_INTERRUPTS_ON_CTXSW sched: Fix ancient race in do_exit() sched/nohz: Fix nohz cpu idle load balancing state with cpu hotplug sched/s390: Fix compile error in sched/core.c sched: Fix rq->nr_uninterruptible update race * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/reboot: Remove VersaLogic Menlow reboot quirk x86/reboot: Skip DMI checks if reboot set by user x86: Properly parenthesize cmpxchg() macro arguments	2012-02-02 11:11:13 -08:00
Prarit Bhargava	b0f4c4b32c	bugs, x86: Fix printk levels for panic, softlockups and stack dumps rsyslog will display KERN_EMERG messages on a connected terminal. However, these messages are useless/undecipherable for a general user. For example, after a softlockup we get: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Stack: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Call Trace: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89 This happens because the printk levels for these messages are incorrect. Only an informational message should be displayed on a terminal. I modified the printk levels for various messages in the kernel and tested the output by using the drivers/misc/lkdtm.c kernel modules (ie, softlockups, panics, hard lockups, etc.) and confirmed that the console output was still the same and that the output to the terminals was correct. For example, in the case of a softlockup we now see the much more informative: Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ... BUG: soft lockup - CPU4 stuck for 60s! instead of the above confusing messages. AFAICT, the messages no longer have to be KERN_EMERG. In the most important case of a panic we set console_verbose(). As for the other less severe cases the correct data is output to the console and /var/log/messages. Successfully tested by me using the drivers/misc/lkdtm.c module. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: dzickus@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-26 21:28:45 +01:00
Linus Torvalds	507a03c1cb	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux This includes initial support for the recently published ACPI 5.0 spec. In particular, support for the "hardware-reduced" bit that eliminates the dependency on legacy hardware. APEI has patches resulting from testing on real hardware. Plus other random fixes. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (52 commits) acpi/apei/einj: Add extensions to EINJ from rev 5.0 of acpi spec intel_idle: Split up and provide per CPU initialization func ACPI processor: Remove unneeded variable passed by acpi_processor_hotadd_init V2 ACPI processor: Remove unneeded cpuidle_unregister_driver call intel idle: Make idle driver more robust intel_idle: Fix a cast to pointer from integer of different size warning in intel_idle ACPI: kernel-parameters.txt : Add intel_idle.max_cstate intel_idle: remove redundant local_irq_disable() call ACPI processor: Fix error path, also remove sysdev link ACPI: processor: fix acpi_get_cpuid for UP processor intel_idle: fix API misuse ACPI APEI: Convert atomicio routines ACPI: Export interfaces for ioremapping/iounmapping ACPI registers ACPI: Fix possible alignment issues with GAS 'address' references ACPI, ia64: Use SRAT table rev to use 8bit or 16/32bit PXM fields (ia64) ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64) ACPI: Store SRAT table revision ACPI, APEI, Resolve false conflict between ACPI NVS and APEI ACPI, Record ACPI NVS regions ACPI, APEI, EINJ, Refine the fix of resource conflict ...	2012-01-18 15:51:48 -08:00
Kurt Garloff	cd298f60a2	ACPI, x86: Use SRAT table rev to use 8bit or 32bit PXM fields (x86/x86-64) In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides 32bits for these. The new fields were reserved before. According to the ACPI spec, the OS must disregrard reserved fields. x86/x86-64 was rather inconsistent prior to this patch; it used 8 bits for the pxm field in cpu_affinity, but 32 bits in mem_affinity. This patch makes it consistent: Either use 8 bits consistently (SRAT rev 1 or lower) or 32 bits (SRAT rev 2 or higher). cc: x86@kernel.org Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com>	2012-01-17 04:20:31 -05:00
Linus Torvalds	0a80939b3e	Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJPD2aFAAoJENkgDmzRrbjxNzsQAIeYbbrXYLjr6kQzUSngj/eC FzjaTEfYTQIeuQCFJHcHthyc5lXV4sQbo3jOezW+Bp5yuDJL2aWIHesSfWZe7imu zQdM4VshOYdAmUR9Q0AW5zhB8Smbs7/AyABiF2jm4p0ZPOuyMDSlei9sjvE9Vjvt B7g5ht7L6kz0JbDnwwy0u5gs+tEitwpXYId9Y4ysZIBzIbL0qkPX8veOddGTMy0N 8xhWXaKtufpjvxFD2ORLDsw3AkoF1xXSNuFd/5nzCNpbeE7TW931jfkPoqJumuAO 7GLxcU9kKYl+IICobC6wBtsj/RrB7w+cBXMvPGwdBliam1qaRhUcJZi5FLM/Ha5d 2A9QDYNUpoXiO8JbPXrV9Z+Y0+Co8RilsQj7R/rjZh6AbbYCWt9nxzx2Svl/RfTr xfiimHuB2P3rHjOvpCXULwOOuE5c8MzPuWncpdjiD3uGXOY/aY+X1m+if/quJw9D grPlKL0+YiRakEYUeGG4M77KCqyKFZaF7L7UQPbqfZcj8V/9AW3/7U5I/B9RlAjs idsr4fcf5s0N+oKUyTCW1ncpUDQNiwbU2NyJQqeu1ZxaRGj72AgyvsaNeyIPDyK+ f6x95Bi7i8KLjXc9Z1KvJwh2Nxt25gNUiTYVha/9H2NpJGd1cfI15kTOGXrgddVv 1pvuGcJDZwYiwfiXr3FL =HHrh -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://github.com/rustyrussell/linux Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1 * tag 'for-linus' of git://github.com/rustyrussell/linux: module_param: check that bool parameters really are bool. intelfbdrv.c: bailearly is an int module_param paride/pcd: fix bool verbose module parameter. module_param: make bool parameters really bool (drivers & misc) module_param: make bool parameters really bool (arch) module_param: make bool parameters really bool (core code) kernel/async: remove redundant declaration. printk: fix unnecessary module_param_name. lirc_parallel: fix module parameter description. module_param: avoid bool abuse, add bint for special cases. module_param: check type correctness for module_param_array modpost: use linker section to generate table. modpost: use a table rather than a giant if/else statement. modules: sysfs - export: taint, coresize, initsize kernel/params: replace DEBUGP with pr_debug module: replace DEBUGP with pr_debug module: struct module_ref should contains long fields module: Fix performance regression on modules with large symbol tables module: Add comments describing how the "strmap" logic works Fix up conflicts in scripts/mod/file2alias.c due to the new linker- generated table approach to adding __mod_*_device_table entries. The ARM sa11x0 mcp bus needed to be converted to that too.	2012-01-14 12:32:16 -08:00
Wanlong Gao	9512938b88	cpumask: update setup_node_to_cpumask_map() comments node_to_cpumask() has been replaced by cpumask_of_node(), and wholly removed since commit `29c337a0` ("cpumask: remove obsolete node_to_cpumask now everyone uses cpumask_of_node"). So update the comments for setup_node_to_cpumask_map(). Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-01-12 20:13:11 -08:00
Rusty Russell	476bc0015b	module_param: make bool parameters really bool (arch) module_param(bool) used to counter-intuitively take an int. In `fddd5201` (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2012-01-13 09:32:18 +10:30
Linus Torvalds	d0b9706c20	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/numa: Add constraints check for nid parameters mm, x86: Remove debug_pagealloc_enabled x86/mm: Initialize high mem before free_all_bootmem() arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map() x86: Fix mmap random address range x86, mm: Unify zone_sizes_init() x86, mm: Prepare zone_sizes_init() for unification x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32 x86, mm: Use max_pfn instead of highend_pfn x86, mm: Move zone init from paging_init() on 64-bit x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit	2012-01-11 19:12:10 -08:00
Linus Torvalds	cf3f33551b	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Use "do { } while(0)" for empty lock_cmos()/unlock_cmos() macros x86: Use "do { } while(0)" for empty flush_tlb_fix_spurious_fault() macro x86, CPU: Drop superfluous get_cpu_cap() prototype arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static arch/x86/kernel/ptrace.c: Quiet sparse noise x86: Use kmemdup() in copy_thread(), rather than duplicating its implementation x86: Replace the EVT_TO_HPET_DEV() macro with an inline function	2012-01-06 14:00:12 -08:00
Linus Torvalds	69734b644b	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) x86: Fix atomic64_xxx_cx8() functions x86: Fix and improve cmpxchg_double{,_local}() x86_64, asm: Optimise fls(), ffs() and fls64() x86, bitops: Move fls64.h inside __KERNEL__ x86: Fix and improve percpu_cmpxchg{8,16}b_double() x86: Report cpb and eff_freq_ro flags correctly x86/i386: Use less assembly in strlen(), speed things up a bit x86: Use the same node_distance for 32 and 64-bit x86: Fix rflags in FAKE_STACK_FRAME x86: Clean up and extend do_int3() x86: Call do_notify_resume() with interrupts enabled x86/div64: Add a micro-optimization shortcut if base is power of two x86-64: Cleanup some assembly entry points x86-64: Slightly shorten line system call entry and exit paths x86-64: Reduce amount of redundant code generated for invalidate_interruptNN x86-64: Slightly shorten int_ret_from_sys_call x86, efi: Convert efi_phys_get_time() args to physical addresses x86: Default to vsyscall=emulate x86-64: Set siginfo and context on vsyscall emulation faults x86: consolidate xchg and xadd macros ...	2012-01-06 13:59:14 -08:00
Linus Torvalds	67b0243131	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Skip cpus with apic-ids >= 255 in !x2apic_mode x86, x2apic: Allow "nox2apic" to disable x2apic mode setup by BIOS x86, x2apic: Fallback to xapic when BIOS doesn't setup interrupt-remapping x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present x86, apic: Add probe() for apic_flat x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86' x86: Convert per-cpu counter icr_read_retry_count into a member of irq_stat x86: Add per-cpu stat counter for APIC ICR read tries pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86 x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code support x86: Add NumaChip support x86: Add x86_init platform override to fix up NUMA core numbering x86: Make flat_init_apic_ldr() available	2012-01-06 13:58:21 -08:00
Ingo Molnar	adaf4ed2ab	Merge commit 'v3.2-rc7' into x86/asm Merge reason: Update from -rc4 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2012-01-04 15:01:28 +01:00
Yinghai Lu	a35fd28256	x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present If the x2apic feature is not present (either the cpu is not capable of it or the user has disabled the feature using boot-parameter etc), ignore the x2apic MADT and SRAT entries provided by the ACPI tables. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/20111222014632.540896503@sbsiddha-desk.sc.intel.com Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-12-23 11:00:50 -08:00
Ingo Molnar	45aa0663cc	Merge branch 'memblock-kill-early_node_map' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/memblock	2011-12-20 12:14:26 +01:00
Youquan Song	b6999b1912	thp: add compound tail page _mapcount when mapped With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page failed to be used. The root cause is that 1GB page allocation calls gup_huge_pud() while 2M page calls gup_huge_pmd. If compound pages are used and the page is a tail page, gup_huge_pmd() increases _mapcount to record tail page are mapped while gup_huge_pud does not do that. So when the mapped page is relesed, it will result in kernel oops because the page is not marked mapped. This patch add tail process for compound page in 1GB huge page which keeps the same process as 2M page. Reproduce like: 1. Add grub boot option: hugepagesz=1G hugepages=8 2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages 3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages -net none -device pci-assign,host=07:00.1 kernel BUG at mm/swap.c:114! invalid opcode: 0000 [#1] SMP Call Trace: put_page+0x15/0x37 kvm_release_pfn_clean+0x31/0x36 kvm_iommu_put_pages+0x94/0xb1 kvm_iommu_unmap_memslots+0x80/0xb6 kvm_assign_device+0xba/0x117 kvm_vm_ioctl_assigned_device+0x301/0xa47 kvm_vm_ioctl+0x36c/0x3a2 do_vfs_ioctl+0x49e/0x4e4 sys_ioctl+0x5a/0x7c system_call_fastpath+0x16/0x1b RIP put_compound_page+0xd4/0x168 Signed-off-by: Youquan Song <youquan.song@intel.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-12-09 07:50:28 -08:00
Petr Holasek	54eed6cb16	x86/numa: Add constraints check for nid parameters This patch adds constraint checks to the numa_set_distance() function. When the check triggers (this should not happen normally) it emits a warning and avoids a store to a negative index in numa_distance[] array - i.e. avoids memory corruption. Negative ids can be passed when the pxm-to-nids mapping is not properly filled while parsing the SRAT. Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Anton Arapov <anton@redhat.com> Link: http://lkml.kernel.org/r/20111208121640.GA2229@dhcp-27-244.brq.redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-09 08:03:34 +01:00
Stanislaw Gruszka	54c29c635a	mm, x86: Remove debug_pagealloc_enabled When (no)bootmem finish operation, it pass pages to buddy allocator. Since debug_pagealloc_enabled is not set, we will do not protect pages, what is not what we want with CONFIG_DEBUG_PAGEALLOC=y. To fix remove debug_pagealloc_enabled. That variable was introduced by commit `12d6f21e` "x86: do not PSE on CONFIG_DEBUG_PAGEALLOC=y" to get more CPA (change page attribude) code testing. But currently we have CONFIG_CPA_DEBUG, which test CPA. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1322582711-14571-1-git-send-email-sgruszka@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 09:24:07 +01:00
Stanislaw Gruszka	855c743a27	x86/mm: Initialize high mem before free_all_bootmem() Patch fixes a boot crash with pagealloc debugging enabled: Initializing HighMem for node 0 (000377fe:0003fff0) BUG: unable to handle kernel paging request at f6fefe80 IP: [<c1621ab5>] find_range_array+0x5e/0x69 [...] Call Trace: [<c1622064>] __get_free_all_memory_range+0x39/0xb4 [<c1620dd0>] add_highpages_with_active_regions+0x18/0x9b [<c1621a2e>] set_highmem_pages_init+0x70/0x90 [<c162122b>] mem_init+0x50/0x21b [<c16155bd>] start_kernel+0x1bf/0x31c [<c1615065>] i386_start_kernel+0x65/0x67 The crash happens when memblock wants to allocate big area for temporary "struct range" array and reuses pages from top of low memory, which were already passed to the buddy allocator. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: linux-mm@kvack.org Cc: Mel Gorman <mgorman@suse.de> Link: http://lkml.kernel.org/r/20111206080833.GB3105@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-06 09:23:53 +01:00
H Hartley Sweeten	2d070eff6b	arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static Local functions should be marked static. This also quiets the following sparse noise: warning: symbol '_set_memory_array' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: hartleys@visionengravers.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:11:05 +01:00
Ludwig Nussel	9af0c7a6fa	x86: Fix mmap random address range On x86_32 casting the unsigned int result of get_random_int() to long may result in a negative value. On x86_32 the range of mmap_rnd() therefore was -255 to 255. The 32bit mode on x86_64 used 0 to 255 as intended. The bug was introduced by `675a081` ("x86: unify mmap_{32\|64}.c") in January 2008. Signed-off-by: Ludwig Nussel <ludwig.nussel@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: harvey.harrison@gmail.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/201111152246.pAFMklOB028527@wpaz5.hot.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:07:23 +01:00
Konrad Rzeszutek Wilk	2cd1c8d4dc	x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode Fix an outstanding issue that has been reported since 2.6.37. Under a heavy loaded machine processing "fork()" calls could crash with: BUG: unable to handle kernel paging request at f573fc8c IP: [<c01abc54>] swap_count_continued+0x104/0x180 pdpt = 000000002a3b9027 pde = 0000000001bed067 pte = 0000000000000000 Oops: 0000 [#1] SMP Modules linked in: Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1 EIP: 0061:[<c01abc54>] EFLAGS: 00210246 CPU: 3 EIP is at swap_count_continued+0x104/0x180 .. snip.. Call Trace: [<c01ac222>] ? __swap_duplicate+0xc2/0x160 [<c01040f7>] ? pte_mfn_to_pfn+0x87/0xe0 [<c01ac2e4>] ? swap_duplicate+0x14/0x40 [<c01a0a6b>] ? copy_pte_range+0x45b/0x500 [<c01a0ca5>] ? copy_page_range+0x195/0x200 [<c01328c6>] ? dup_mmap+0x1c6/0x2c0 [<c0132cf8>] ? dup_mm+0xa8/0x130 [<c013376a>] ? copy_process+0x98a/0xb30 [<c013395f>] ? do_fork+0x4f/0x280 [<c01573b3>] ? getnstimeofday+0x43/0x100 [<c010f770>] ? sys_clone+0x30/0x40 [<c06c048d>] ? ptregs_clone+0x15/0x48 [<c06bfb71>] ? syscall_call+0x7/0xb The problem is that in copy_page_range() we turn lazy mode on, and then in swap_entry_free() we call swap_count_continued() which ends up in: map = kmap_atomic(page, KM_USER0) + offset; and then later we touch map. Since we are running in batched mode (lazy) we don't actually set up the PTE mappings and the kmap_atomic is not done synchronously and ends up trying to dereference a page that has not been set. Looking at kmap_atomic_prot_pfn(), it uses 'arch_flush_lazy_mmu_mode' and doing the same in kmap_atomic_prot() and __kunmap_atomic() makes the problem go away. Interestingly, commit `b8bcfe997e` ("x86/paravirt: remove lazy mode in interrupts") removed part of this to fix an interrupt issue - but it went to far and did not consider this scenario. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 17:06:34 +01:00
Andy Lutomirski	4fc3490114	x86-64: Set siginfo and context on vsyscall emulation faults To make this work, we teach the page fault handler how to send signals on failed uaccess. This only works for user addresses (kernel addresses will never hit the page fault handler in the first place), so we need to generate signals for those separately. This gets the tricky case right: if the user buffer spans multiple pages and only the second page is invalid, we set cr2 and si_addr correctly. UML relies on this behavior to "fault in" pages as needed. We steal a bit from thread_info.uaccess_err to enable this. Before this change, uaccess_err was a 32-bit boolean value. This fixes issues with UML when vsyscall=emulate. Reported-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: richard -rw- weinberger <richard.weinberger@gmail.com> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-12-05 12:17:27 +01:00
Tejun Heo	d4bbf7e775	Merge branch 'master' into x86/memblock Conflicts & resolutions: * arch/x86/xen/setup.c `dc91c728fd` "xen: allow extra memory to be in multiple regions" `24aa07882b` "memblock, x86: Replace memblock_x86_reserve/free..." conflicted on xen_add_extra_mem() updates. The resolution is trivial as the latter just want to replace memblock_x86_reserve_range() with memblock_reserve(). * drivers/pci/intel-iommu.c `166e9278a3` "x86/ia64: intel-iommu: move to drivers/iommu/" `5dfe8660a3` "bootmem: Replace work_with_active_regions() with..." conflicted as the former moved the file under drivers/iommu/. Resolved by applying the chnages from the latter on the moved file. * mm/Kconfig `6661672053` "memblock: add NO_BOOTMEM config symbol" `c378ddd53f` "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option" conflicted trivially. Both added config options. Just letting both add their own options resolves the conflict. * mm/memblock.c `d1f0ece6cd` "mm/memblock.c: small function definition fixes" `ed7b56a799` "memblock: Remove memblock_memory_can_coalesce()" confliected. The former updates function removed by the latter. Resolution is trivial. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-11-28 09:46:22 -08:00
Pekka Enberg	1762391530	x86, mm: Unify zone_sizes_init() Now that zone_sizes_init() is identical on 32-bit and 64-bit, move the code to arch/x86/mm/init.c and use it for both architectures. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-7-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-11 10:22:55 +01:00
Pekka Enberg	248b52b97d	x86, mm: Prepare zone_sizes_init() for unification Make 32-bit and 64-bit zone_sizes_init() identical in preparation for unification. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-6-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-11 10:22:53 +01:00
Pekka Enberg	ece838b625	x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit 64-bit has no highmem so max_low_pfn is always the same as 'max_pfn'. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-5-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-11 10:22:50 +01:00
Pekka Enberg	80b3cac97b	x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32 In preparation for unifying 32-bit and 64-bit zone_sizes_init() make sure ZONE_DMA32 is wrapped in CONFIG_ZONE_DMA32. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Arun Sharma <asharma@fb.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-4-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-11 10:22:48 +01:00
Pekka Enberg	e4794640ca	x86, mm: Use max_pfn instead of highend_pfn The 'highend_pfn' variable is always set to 'max_pfn' so just use the latter directly. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-3-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-11 10:22:45 +01:00
Pekka Enberg	4c0b2e5f89	x86, mm: Move zone init from paging_init() on 64-bit This patch introduces a zone_sizes_init() helper function on 64-bit to make it more similar to 32-bit init. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-2-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-11 10:22:43 +01:00
Pekka Enberg	ff14c1d015	x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit Use MAX_DMA_PFN which represents the 16 MB ISA DMA limit on 32-bit x86 just like we do on 64-bit. Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Link: http://lkml.kernel.org/r/1320155902-10424-1-git-send-email-penberg@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-11-11 10:22:41 +01:00
Andrea Arcangeli	b35a35b556	thp: share get_huge_page_tail() This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:58 -07:00
Andrea Arcangeli	70b50f94f1	mm: thp: tail page refcounting fix Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-11-02 16:06:57 -07:00
Linus Torvalds	ca836a2543	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, doc: Remove int 0xcc from entry_64.S documentation x86, vsyscall: Add missing <asm/fixmap.h> to arch/x86/mm/fault.c Fix up trivial conflicts in arch/x86/mm/fault.c (asm/fixmap.h vs asm/vsyscall.h: both work, which to use? Whatever..)	2011-10-28 05:46:02 -07:00
Linus Torvalds	e34eb39c1c	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, amd: Include linux/elf.h since we use stuff from asm/elf.h x86: cache_info: Update calculation of AMD L3 cache indices x86: cache_info: Kill the atomic allocation in amd_init_l3_cache() x86: cache_info: Kill the moronic shadow struct x86: cache_info: Remove bogus free of amd_l3_cache data x86, amd: Include elf.h explicitly, prepare the code for the module.h split x86-32, amd: Move va_align definition to unbreak 32-bit build x86, amd: Move BSP code to cpu_dev helper x86: Add a BSP cpu_dev helper x86, amd: Avoid cache aliasing penalties on AMD family 15h	2011-10-28 05:03:12 -07:00
Linus Torvalds	7b86572a7a	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64: Fix CFI data for interrupt frames x86-64: Don't apply destructive erratum workaround on unaffected CPUs	2011-10-26 17:42:03 +02:00
Linus Torvalds	59e5253417	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits) MAINTAINERS: linux-m32r is moderated for non-subscribers linux@lists.openrisc.net is moderated for non-subscribers Drop default from "DM365 codec select" choice parisc: Kconfig: cleanup Kernel page size default Kconfig: remove redundant CONFIG_ prefix on two symbols cris: remove arch/cris/arch-v32/lib/nand_init.S microblaze: add missing CONFIG_ prefixes h8300: drop puzzling Kconfig dependencies MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers tty: drop superfluous dependency in Kconfig ARM: mxc: fix Kconfig typo 'i.MX51' Fix file references in Kconfig files aic7xxx: fix Kconfig references to READMEs Fix file references in drivers/ide/ thinkpad_acpi: Fix printk typo 'bluestooth' bcmring: drop commented out line in Kconfig btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888' doc: raw1394: Trivial typo fix CIFS: Don't free volume_info->UNC until we are entirely done with it. treewide: Correct spelling of successfully in comments ...	2011-10-25 12:11:02 +02:00
Takashi Iwai	8548c84da2	x86: Fix S4 regression Commit `4b239f458` ("x86-64, mm: Put early page table high") causes a S4 regression since 2.6.39, namely the machine reboots occasionally at S4 resume. It doesn't happen always, overall rate is about 1/20. But, like other bugs, once when this happens, it continues to happen. This patch fixes the problem by essentially reverting the memory assignment in the older way. Signed-off-by: Takashi Iwai <tiwai@suse.de> Cc: <stable@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Yinghai Lu <yinghai.lu@oracle.com> [ We'll hopefully find the real fix, but that's too late for 3.1 now ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-10-24 06:55:20 +02:00
Jan Beulich	e05139f256	x86-64: Don't apply destructive erratum workaround on unaffected CPUs Erratum 93 applies to AMD K8 CPUs only, and its workaround (forcing the upper 32 bits of %rip to all get set under certain conditions) is actually getting in the way of analyzing page faults occurring during EFI physical mode runtime calls (in particular the page table walk shown is completely unrelated to the actual fault). This is because typically EFI runtime code lives in the space between 2G and 4G, which - modulo the above manipulation - is likely to overlap with the kernel or modules area. While even for the other errata workarounds their taking effect could be limited to just the affected CPUs, none of them appears to be destructive, and they're generally getting called only outside of performance critical paths, so they're being left untouched. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/4E835FE30200007800058464@nat28.tlf.novell.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-09-28 19:04:48 +02:00
Jiri Kosina	e060c38434	Merge branch 'master' into for-next Fast-forward merge with Linus to be able to merge patches based on more recent version of the tree.	2011-09-15 15:08:18 +02:00
Jesper Juhl	469bfb4a50	Remove unneeded version.h include from arch/x86/ It was pointed out by 'make versioncheck' that the include of linux/version.h is not needed in arch/x86/mm/mmio-mod.c . This patch removes it. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2011-09-15 14:57:06 +02:00
H. Peter Anvin	fab1167c46	x86, vsyscall: Add missing <asm/fixmap.h> to arch/x86/mm/fault.c arch/x86/mm/fault.c now depend on having the symbol VSYSCALL_START defined, which is best handled by including <asm/fixmap.h> (it isn't unreasonable we may want other fixed addresses in this file in the future, and so it is cleaner than including <asm/vsyscall.h> directly.) This addresses an x86-64 allnoconfig build failure. On other configurations it was masked by an indirect path: <asm/smp.h> -> <asm/apic.h> -> <asm/fixmap.h> -> <asm/vsyscall.h> ... however, the first such include is conditional on CONFIG_X86_LOCAL_APIC. Originally-by: Randy Dunlap <rdunlap@xenotime.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/CA%2B55aFxsOMc9=p02r8-QhJ=h=Mqwckk4_Pnx9LQt5%2BfqMp_exQ@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-16 08:04:02 -07:00
Randy Dunlap	cedf03bd9a	x86: fix mm/fault.c build arch/x86/mm/fault.c needs to include asm/vsyscall.h to fix a build error: arch/x86/mm/fault.c: In function '__bad_area_nosemaphore': arch/x86/mm/fault.c:728: error: 'VSYSCALL_START' undeclared (first use in this function) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-08-15 19:10:50 -07:00
Linus Torvalds	06e727d2a5	Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip: x86-64: Rework vsyscall emulation and add vsyscall= parameter x86-64: Wire up getcpu syscall x86: Remove unnecessary compile flag tweaks for vsyscall code x86-64: Add vsyscall:emulate_vsyscall trace event x86-64: Add user_64bit_mode paravirt op x86-64, xen: Enable the vvar mapping x86-64: Work around gold bug 13023 x86-64: Move the "user" vsyscall segment out of the data segment. x86-64: Pad vDSO to a page boundary	2011-08-12 20:46:24 -07:00
Andy Lutomirski	3ae36655b9	x86-64: Rework vsyscall emulation and add vsyscall= parameter There are three choices: vsyscall=native: Vsyscalls are native code that issues the corresponding syscalls. vsyscall=emulate (default): Vsyscalls are emulated by instruction fault traps, tested in the bad_area path. The actual contents of the vsyscall page is the same as the vsyscall=native case except that it's marked NX. This way programs that make assumptions about what the code in the page does will not be confused when they read that code. vsyscall=none: Trying to execute a vsyscall will segfault. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-10 19:26:46 -05:00
Borislav Petkov	9387f774d6	x86-32, amd: Move va_align definition to unbreak 32-bit build hpa reported that `dfb09f9b7a` breaks 32-bit builds with the following error message: /home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:437: undefined reference to `va_align' /home/hpa/kernel/linux-tip.cpu/arch/x86/kernel/cpu/amd.c:436: undefined reference to `va_align' This is due to the fact that va_align is a global in a 64-bit only compilation unit. Move it to mmap.c where it is visible to both subarches. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1312633899-1131-1-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-08-06 11:44:57 -07:00
Borislav Petkov	dfb09f9b7a	x86, amd: Avoid cache aliasing penalties on AMD family 15h This patch provides performance tuning for the "Bulldozer" CPU. With its shared instruction cache there is a chance of generating an excessive number of cache cross-invalidates when running specific workloads on the cores of a compute module. This excessive amount of cross-invalidations can be observed if cache lines backed by shared physical memory alias in bits [14:12] of their virtual addresses, as those bits are used for the index generation. This patch addresses the issue by clearing all the bits in the [14:12] slice of the file mapping's virtual address at generation time, thus forcing those bits the same for all mappings of a single shared library across processes and, in doing so, avoids instruction cache aliases. It also adds the command line option "align_va_addr=(32\|64\|on\|off)" with which virtual address alignment can be enabled for 32-bit or 64-bit x86 individually, or both, or be completely disabled. This change leaves virtual region address allocation on other families and/or vendors unaffected. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-05 12:26:44 -07:00
Andy Lutomirski	318f5a2a67	x86-64: Add user_64bit_mode paravirt op Three places in the kernel assume that the only long mode CPL 3 selector is __USER_CS. This is not true on Xen -- Xen's sysretq changes cs to the magic value 0xe033. Two of the places are corner cases, but as of "x86-64: Improve vsyscall emulation CS and RIP handling" (`c9712944b2`), vsyscalls will segfault if called with Xen's extra CS selector. This causes a panic when older init builds die. It seems impossible to make Xen use __USER_CS reliably without taking a performance hit on every system call, so this fixes the tests instead with a new paravirt op. It's a little ugly because ptrace.h can't include paravirt.h. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-08-04 16:13:49 -07:00
Arun Sharma	60063497a9	atomic: use <linux/atomic.h> This allows us to move duplicated code in <asm/atomic.h> (atomic_inc_not_zero() for now) to <linux/atomic.h> Signed-off-by: Arun Sharma <asharma@fb.com> Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-07-26 16:49:47 -07:00
Linus Torvalds	9e39264ed4	Merge branch 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: Implement pfn -> nid mapping granularity check x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/	2011-07-22 17:04:18 -07:00
Linus Torvalds	0a613b647b	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, smpboot: Mark the names[] array in __inquire_remote_apic() as const x86: Convert vmalloc()+memset() to vzalloc()	2011-07-22 17:02:38 -07:00
Linus Torvalds	4d4abdcb1d	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (123 commits) perf: Remove the nmi parameter from the oprofile_perf backend x86, perf: Make copy_from_user_nmi() a library function perf: Remove perf_event_attr::type check x86, perf: P4 PMU - Fix typos in comments and style cleanup perf tools: Make test use the preset debugfs path perf tools: Add automated tests for events parsing perf tools: De-opt the parse_events function perf script: Fix display of IP address for non-callchain path perf tools: Fix endian conversion reading event attr from file header perf tools: Add missing 'node' alias to the hw_cache[] array perf probe: Support adding probes on offline kernel modules perf probe: Add probed module in front of function perf probe: Introduce debuginfo to encapsulate dwarf information perf-probe: Move dwarf library routines to dwarf-aux.{c, h} perf probe: Remove redundant dwarf functions perf probe: Move strtailcmp to string.c perf probe: Rename DIE_FIND_CB_FOUND to DIE_FIND_CB_END tracing/kprobe: Update symbol reference when loading module tracing/kprobes: Support module init function probing kprobes: Return -ENOENT if probe point doesn't exist ...	2011-07-22 16:44:39 -07:00
Tejun Heo	24aa07882b	memblock, x86: Replace memblock_x86_reserve/free_range() with generic ones Other than sanity check and debug message, the x86 specific version of memblock reserve/free functions are simple wrappers around the generic versions - memblock_reserve/free(). This patch adds debug messages with caller identification to the generic versions and replaces x86 specific ones and kills them. arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty after this change and removed. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:53 -07:00
Tejun Heo	474b881bf4	x86: Use absent_pages_in_range() instead of memblock_x86_hole_size() memblock_x86_hole_size() calculates the total size of holes in a given range according to memblock and is used by numa emulation code and numa_meminfo_cover_memory(). Since conversion to MEMBLOCK_NODE_MAP, absent_pages_in_range() also uses memblock and gives the same result. This patch replaces memblock_x86_hole_size() uses with absent_pages_in_range(). After the conversion the x86 function doesn't have any user left and is killed. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-12-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:51 -07:00
Tejun Heo	6b5d41a1b9	memblock, x86: Reimplement memblock_find_dma_reserve() using iterators memblock_find_dma_reserve() wants to find out how much memory is reserved under MAX_DMA_PFN. memblock_x86_memory_[free_]in_range() are used to find out the amounts of all available and free memory in the area, which are then subtracted to find out the amount of reservation. memblock_x86_memblock_[free_]in_range() are implemented using __memblock_x86_memory_in_range() which builds ranges from memblock and then count them, which is rather unnecessarily complex. This patch open codes the counting logic directly in memblock_find_dma_reserve() using memblock iterators and removes now unused __memblock_x86_memory_in_range() and find_range_array(). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-11-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:50 -07:00
Tejun Heo	8a9ca34c11	memblock, x86: Replace __get_free_all_memory_range() with for_each_free_mem_range() __get_free_all_memory_range() walks memblock, calculates free memory areas and fills in the specified range. It can be easily replaced with for_each_free_mem_range(). Convert free_low_memory_core_early() and add_highpages_with_active_regions() to for_each_free_mem_range(). This leaves __get_free_all_memory_range() without any user. Kill it and related functions. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-10-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:49 -07:00
Tejun Heo	64a02daacb	memblock, x86: Make free_all_memory_core_early() explicitly free lowmem only nomemblock is currently used only by x86 and on x86_32 free_all_memory_core_early() silently freed only the low mem because get_free_all_memory_range() in arch/x86/mm/memblock.c implicitly limited range to max_low_pfn. Rename free_all_memory_core_early() to free_low_memory_core_early() and make it call __get_free_all_memory_range() and limit the range to max_low_pfn explicitly. This makes things clearer and also is consistent with the bootmem behavior. This leaves get_free_all_memory_range() without any user. Kill it. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-9-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:49 -07:00
Tejun Heo	8d89ac8084	x86: Replace memblock_x86_find_in_range_size() with for_each_free_mem_range() setup_bios_corruption_check() and memtest do_one_pass() open code memblock free area iteration using memblock_x86_find_in_range_size(). Convert them to use for_each_free_mem_range() instead. This leaves memblock_x86_find_in_range_size() and memblock_x86_check_reserved_size() unused. Kill them. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310462166-31469-8-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:48 -07:00
Tejun Heo	0608f70c78	x86: Use HAVE_MEMBLOCK_NODE_MAP From 5732e1247898d67cbf837585150fe9f68974671d Mon Sep 17 00:00:00 2001 From: Tejun Heo <tj@kernel.org> Date: Thu, 14 Jul 2011 11:22:16 +0200 Convert x86 to HAVE_MEMBLOCK_NODE_MAP. The only difference in memory handling is that allocations can't no longer cross node boundaries whether they're node affine or not, which shouldn't matter at all. This conversion will enable further simplification of boot memory handling. -v2: Fix build failure on !NUMA configurations discovered by hpa. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110714094423.GG3455@htj.dyndns.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:47:43 -07:00
Tejun Heo	eb40c4c27f	memblock, x86: Replace memblock_x86_find_in_range_node() with generic memblock calls With the previous changes, generic NUMA aware memblock API has feature parity with memblock_x86_find_in_range_node(). There currently are two users - x86 setup_node_data() and __alloc_memory_core_early() in nobootmem.c. This patch converts the former to use memblock_alloc_nid() and the latter memblock_find_range_in_node(), and kills memblock_x86_find_in_range_node() and related functions including find_memory_early_core_early() in page_alloc.c. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310460395-30913-9-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:45:35 -07:00
Tejun Heo	5dfe8660a3	bootmem: Replace work_with_active_regions() with for_each_mem_pfn_range() Callback based iteration is cumbersome and much less useful than for_each_*() iterator. This patch implements for_each_mem_pfn_range() which replaces work_with_active_regions(). All the current users of work_with_active_regions() are converted. This simplifies walking over early_node_map and will allow converting internal logics in page_alloc to use iterator instead of walking early_node_map directly, which in turn will enable moving node information to memblock. powerpc change is only compile tested. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110714074610.GD3455@htj.dyndns.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-14 11:45:29 -07:00
Tejun Heo	1f5026a7e2	memblock: Kill MEMBLOCK_ERROR `25818f0f28` (memblock: Make MEMBLOCK_ERROR be 0) thankfully made MEMBLOCK_ERROR 0 and there already are codes which expect error return to be 0. There's no point in keeping MEMBLOCK_ERROR around. End its misery. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1310457490-3356-6-git-send-email-tj@kernel.org Cc: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-13 16:36:01 -07:00
Tejun Heo	1e01979c8f	x86, numa: Implement pfn -> nid mapping granularity check SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use sections array to map pfn to nid which is limited in granularity. If NUMA nodes are laid out such that the mapping cannot be accurate, boot will fail triggering BUG_ON() in mminit_verify_page_links(). On 32bit, it's 512MiB w/ PAE and SPARSEMEM. This seems to have been granular enough until commit `2706a0bf7b` (x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too). Apparently, there is a machine which aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT. This led to the following BUG_ON(). On node 0 totalpages: 2096615 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 3927 pages, LIFO batch:0 Normal zone: 1740 pages used for memmap Normal zone: 220978 pages, LIFO batch:31 HighMem zone: 16405 pages used for memmap HighMem zone: 1853533 pages, LIFO batch:31 BUG: Int 6: CR2 (null) EDI (null) ESI 00000002 EBP 00000002 ESP c1543ecc EBX f2400000 EDX 00000006 ECX (null) EAX 00000001 err (null) EIP c16209aa CS 00000060 flg 00010002 Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000 (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe (null) f7200b80 c16395f0 00200a02 f7200a80 (null) 000375fe 00000002 (null) Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0b #17 Call Trace: [<c136b1e5>] ? early_fault+0x2e/0x2e [<c16209aa>] ? mminit_verify_page_links+0x12/0x42 [<c1620613>] ? memmap_init_zone+0xaf/0x10c [<c1620929>] ? free_area_init_node+0x2b9/0x2e3 [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451 [<c1601d80>] ? paging_init+0x112/0x118 [<c15f578d>] ? setup_arch+0x791/0x82f [<c15f43d9>] ? start_kernel+0x6a/0x257 This patch implements node_map_pfn_alignment() which determines maximum internode alignment and update numa_register_memblks() to reject NUMA configuration if alignment exceeds the pfn -> nid mapping granularity of the memory model as determined by PAGES_PER_SECTION. This makes the problematic machine boot w/ flatmem by rejecting the NUMA config and provides protection against crazy NUMA configurations. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com> Reported-and-Tested-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Conny Seidel <conny.seidel@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 21:58:29 -07:00
Tejun Heo	d0ead15738	x86, mm: s/PAGES_PER_ELEMENT/PAGES_PER_SECTION/ DISCONTIGMEM on x86-32 implements pfn -> nid mapping similarly to SPARSEMEM; however, it calls each mapping unit ELEMENT instead of SECTION. This patch renames it to SECTION so that PAGES_PER_SECTION is valid for both DISCONTIGMEM and SPARSEMEM. This will be used by the next patch to implement mapping granularity check. This patch is trivial constant rename. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110712074422.GA2872@htj.dyndns.org Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-07-12 21:58:11 -07:00
Benjamin Herrenschmidt	a63fdc5156	mm: Move definition of MIN_MEMORY_BLOCK_SIZE to a header The macro MIN_MEMORY_BLOCK_SIZE is currently defined twice in two .c files, and I need it in a third one to fix a powerpc bug, so let's first move it into a header Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Ingo Molnar <mingo@elte.hu>	2011-07-12 11:08:01 +10:00
Ingo Molnar	931da6137e	Merge branch 'tip/perf/core-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core	2011-07-05 11:55:43 +02:00
Peter Zijlstra	a8b0ca17b8	perf: Remove the nmi parameter from the swevent and overflow interface The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-07-01 11:06:35 +02:00
Maarten Lankhorst	7d68dc3f10	x86, efi: Do not reserve boot services regions within reserved areas Commit `916f676f8d` started reserving boot service code since some systems require you to keep that code around until SetVirtualAddressMap is called. However, in some cases those areas will overlap with reserved regions. The proper medium-term fix is to fix the bootloader to prevent the conflicts from occurring by moving the kernel to a better position, but the kernel should check for this possibility, and only reserve regions which can be reserved. Signed-off-by: Maarten Lankhorst <m.b.lankhorst@gmail.com> Link: http://lkml.kernel.org/r/4DF7A005.1050407@gmail.com Acked-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-06-18 22:48:49 +02:00
Masami Hiramatsu	395810627b	x86: Swap save_stack_trace_regs parameters Swap the 1st and 2nd parameters of save_stack_trace_regs() as same as the parameters of save_stack_trace_tsk(). Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung@gmail.com> Link: http://lkml.kernel.org/r/20110608070921.17777.31103.stgit@fedora15 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2011-06-14 22:48:51 -04:00
Joe Perches	c4d017f213	x86: Convert vmalloc()+memset() to vzalloc() Signed-off-by: Joe Perches <joe@perches.com> Cc: Jiri Kosina <trivial@kernel.org> Link: http://lkml.kernel.org/r/10e35243fda0b8739c89ac32a7bdf348ec4752e1.1306603968.git.joe@perches.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-28 19:53:57 +02:00
KOSAKI Motohiro	b80ef10e84	x86: Move do_page_fault()'s error path under unlikely() Ingo suggested SIGKILL check should be moved into slowpath function. This will reduce the page fault fastpath impact of this recent commit: `37b23e0525`: x86,mm: make pagefault killable Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: kamezawa.hiroyu@jp.fujitsu.com Cc: minchan.kim@gmail.com Cc: willy@linux.intel.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/4DDE0B5C.9050907@jp.fujitsu.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-26 13:54:03 +02:00
Peter Zijlstra	3d48ae45e7	mm: Convert i_mmap_lock to a mutex Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:18 -07:00
Peter Zijlstra	1c39517696	mm: now that all old mmu_gather code is gone, remove the storage Fold all the mmu_gather rework patches into one for submission Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:16 -07:00
KOSAKI Motohiro	37b23e0525	x86,mm: make pagefault killable When an oom killing occurs, almost all processes are getting stuck at the following two points. 1) __alloc_pages_nodemask 2) __lock_page_or_retry 1) is not very problematic because TIF_MEMDIE leads to an allocation failure and getting out from page allocator. 2) is more problematic. In an OOM situation, zones typically don't have page cache at all and memory starvation might lead to greatly reduced IO performance. When a fork bomb occurs, TIF_MEMDIE tasks don't die quickly, meaning that a fork bomb may create new process quickly rather than the oom-killer killing it. Then, the system may become livelocked. This patch makes the pagefault interruptible by SIGKILL. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:08 -07:00
Linus Torvalds	d7ef64a9f9	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Eliminate various 'set but not used' warnings x86, SMEP: Fix section mismatch warnings x86, amd: Use _safe() msr access for GartTlbWlk disable code	2011-05-23 08:51:55 -07:00
Gustavo F. Padovan	6ec5ff4bc3	x86: Eliminate various 'set but not used' warnings Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi> Cc: Joerg Roedel <joerg.roedel@amd.com> (supporter:AMD IOMMU (AMD-VI)) Cc: iommu@lists.linux-foundation.org (open list:AMD IOMMU (AMD-VI)) Link: http://lkml.kernel.org/r/1305918786-7239-3-git-send-email-padovan@profusion.mobi Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-21 19:10:33 +02:00
Linus Torvalds	268bb0ce3e	sanitize <linux/prefetch.h> usage Commit `e66eed651f` ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw(' -- '.[ch]') grep -L 'prefetchw(' $(git grep -l 'linux/prefetch.h' -- '.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-20 12:50:29 -07:00
Linus Torvalds	13588209aa	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits) x86, mm: Allow ZONE_DMA to be configurable x86, NUMA: Trim numa meminfo with max_pfn in a separate loop x86, NUMA: Rename setup_node_bootmem() to setup_node_data() x86, NUMA: Enable emulation on 32bit too x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too x86, NUMA: Rename amdtopology_64.c to amdtopology.c x86, NUMA: Make numa_init_array() static x86, NUMA: Make 32bit use common NUMA init path x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() x86-32, NUMA: Add @start and @end to init_alloc_remap() x86, NUMA: Remove long 64bit assumption from numa.c x86, NUMA: Enable build of generic NUMA init code on 32bit x86, NUMA: Move NUMA init logic from numa_64.c to numa.c x86-32, NUMA: Update numaq to use new NUMA init protocol x86-32, NUMA: Replace srat_32.c with srat.c x86-32, NUMA: implement temporary NUMA init shims x86, NUMA: Move numa_nodes_parsed to numa.[hc] x86-32, NUMA: Move get_memcfg_numa() into numa_32.c x86, NUMA: make srat.c 32bit safe x86, NUMA: rename srat_64.c to srat.c ...	2011-05-19 18:07:31 -07:00
David Rientjes	dc382fd5bc	x86, mm: Allow ZONE_DMA to be configurable ZONE_DMA is unnecessary for a large number of machines that do not require less than 32-bit DMA addressing, e.g. ISA legacy DMA or PCI cards with a restricted DMA address mask. This patch allows users to disable ZONE_DMA for x86 if they know they will not be using such devices with their kernel. This prevents the VM from unnecessarily reserving a ratio of memory (defaulting to 1/256th of system capacity) with lowmem_reserve_ratio for such allocations when it will never be used. Signed-off-by: David Rientjes <rientjes@google.com> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1105161353560.4353@chino.kir.corp.google.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-05-16 14:03:28 -07:00
Sedat Dilek	53f8023feb	x86/mm: Fix section mismatch derived from native_pagetable_reserve() With CONFIG_DEBUG_SECTION_MISMATCH=y I see these warnings in next-20110415: LD vmlinux.o MODPOST vmlinux.o WARNING: vmlinux.o(.text+0x1ba48): Section mismatch in reference from the function native_pagetable_reserve() to the function .init.text:memblock_x86_reserve_range() The function native_pagetable_reserve() references the function __init memblock_x86_reserve_range(). This is often because native_pagetable_reserve lacks a __init annotation or the annotation of memblock_x86_reserve_range is wrong. This patch fixes the issue. Thanks to pipacs from PaX project for help on IRC. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-05-12 13:05:05 -04:00
Stefano Stabellini	279b706bf8	x86,xen: introduce x86_init.mapping.pagetable_reserve Introduce a new x86_init hook called pagetable_reserve that at the end of init_memory_mapping is used to reserve a range of memory addresses for the kernel pagetable pages we used and free the other ones. On native it just calls memblock_x86_reserve_range while on xen it also takes care of setting the spare memory previously allocated for kernel pagetable pages from RO to RW, so that it can be used for other purposes. A detailed explanation of the reason why this hook is needed follows. As a consequence of the commit: commit `4b239f458c` Author: Yinghai Lu <yinghai@kernel.org> Date: Fri Dec 17 16:58:28 2010 -0800 x86-64, mm: Put early page table high at some point init_memory_mapping is going to reach the pagetable pages area and map those pages too (mapping them as normal memory that falls in the range of addresses passed to init_memory_mapping as argument). Some of those pages are already pagetable pages (they are in the range pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and everything is fine. Some of these pages are not pagetable pages yet (they fall in the range pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they are going to be mapped RW. When these pages become pagetable pages and are hooked into the pagetable, xen will find that the guest has already a RW mapping of them somewhere and fail the operation. The reason Xen requires pagetables to be RO is that the hypervisor needs to verify that the pagetables are valid before using them. The validation operations are called "pinning" (more details in arch/x86/xen/mmu.c). In order to fix the issue we mark all the pages in the entire range pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation is completed only the range pgt_buf_start-pgt_buf_end is reserved by init_memory_mapping. Hence the kernel is going to crash as soon as one of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those ranges are RO). For this reason we need a hook to reserve the kernel pagetable pages we used and free the other ones so that they can be reused for other purposes. On native it just means calling memblock_x86_reserve_range, on Xen it also means marking RW the pagetable pages that we allocated before but that haven't been used before. Another way to fix this is without using the hook is by adding a 'if (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen counterpart, but that is just nasty. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>	2011-05-12 13:05:04 -04:00
Yinghai Lu	e5a10c1bd1	x86, NUMA: Trim numa meminfo with max_pfn in a separate loop During testing 32bit numa unifying code from tj, found one system with more than 64g fails to use numa. It turns out we do not trim numa meminfo correctly against max_pfn in case start address of a node is higher than 64GiB. Bug fix made it to tip tree. This patch moves the checking and trimming to a separate loop. So we don't need to compare low/high in following merge loops. It makes the code more readable. Also it makes the node merge printouts less strange. On a 512GiB numa system with 32bit, before: > NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000) > NUMA: Node 0 [0,80000000) + [100000000,1080000000) -> [0,1000000000) after: > NUMA: Node 0 [0,a0000) + [100000,80000000) -> [0,80000000) > NUMA: Node 0 [0,80000000) + [100000000,1000000000) -> [0,1000000000) Signed-off-by: Yinghai Lu <yinghai@kernel.org> [Updated patch description and comment slightly.] Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 17:24:49 +02:00
Yinghai Lu	a56bca80db	x86, NUMA: Rename setup_node_bootmem() to setup_node_data() After using memblock to replace bootmem, that function only sets up node_data now. Change the name to reflect what it actually does. tj: Minor adjustment to the patch description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 17:24:49 +02:00
Tejun Heo	1b7e03ef75	x86, NUMA: Enable emulation on 32bit too Now that NUMA init path is unified, NUMA emulation can be enabled on 32bit. Make numa_emluation.c safe on 32bit by doing the followings. * Define MAX_DMA32_PFN on 32bit too. * Include bootmem.h for max_pfn declaration. * Use u64 explicitly and always use PFN_PHYS() when converting page number to address. * Avoid __udivdi3() generation on 32bit by doing number of pages calculation instead in split_nodes_interleave(). And drop X86_64 dependency from Kconfig. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	2706a0bf7b	x86, NUMA: Enable CONFIG_AMD_NUMA on 32bit too Now that NUMA init path is unified, amdtopology can be enabled on 32bit. Make amdtopology.c safe on 32bit by explicitly using u64 and drop X86_64 dependency from Kconfig. Inclusion of bootmem.h is added for max_pfn declaration. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	c6f5887820	x86, NUMA: Rename amdtopology_64.c to amdtopology.c amdtopology is going to be used by 32bit too drop _64 suffix. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	752d4f372f	x86, NUMA: Make numa_init_array() static numa_init_array() no longer has users outside of numa.c. Make it static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	bd6709a91a	x86, NUMA: Make 32bit use common NUMA init path With both _numa_init() methods converted and the rest of init code adjusted, numa_32.c now can switch from the 32bit only init code to the common one in numa.c. * Shim get_memcfg_()'s are dropped and initmem_init() calls x86_numa_init(), which is updated to handle NUMAQ. All boilerplate operations including node range limiting, pgdat alloc/init are handled by numa_init(). 32bit only implementation is removed. * 32bit numa_add_memblk(), numa_set_distance() and memory_add_physaddr_to_nid() removed and common versions in numa_32.c enabled for 32bit. This change causes the following behavior changes. * NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized for 32bit too. * Much more sanity checks and configuration cleanups. * Proper handling of node distances. * The same NUMA init messages as 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 17:24:48 +02:00
Tejun Heo	7888e96b26	x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() setup_node_bootmem() is taken from 64bit and doesn't use remap allocator. It's about to be shared with 32bit so add support for it. If NODE_DATA is remapped, it's noted in the debug message and node locality check is skipped as the __pa() of the remapped address doesn't reflect the actual physical address. On 64bit, remap allocator becomes noop and doesn't affect the behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:54 +02:00
Tejun Heo	99cca492ea	x86-32, NUMA: Add @start and @end to init_alloc_remap() Instead of dereferencing node_start/end_pfn[] directly, make init_alloc_remap() take @start and @end and let the caller be responsible for making sure the range is sane. This is to prepare for use from unified NUMA init code. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:54 +02:00
Tejun Heo	38f3e1ca24	x86, NUMA: Remove long 64bit assumption from numa.c Code moved from numa_64.c has assumption that long is 64bit in several places. This patch removes the assumption by using {s\|u}64_t explicity, using PFN_PHYS() for page number -> addr conversions and adjusting printf formats. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	744baba0c4	x86, NUMA: Enable build of generic NUMA init code on 32bit Generic NUMA init code was moved to numa.c from numa_64.c but is still guaraded by CONFIG_X86_64. This patch removes the compile guard and enables compiling on 32bit. * numa_add_memblk() and numa_set_distance() clash with the shim implementation in numa_32.c and are left out. * memory_add_physaddr_to_nid() clashes with 32bit implementation and is left out. * MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32. * node_data definition in numa_32.c removed in favor of the one in numa.c. There are places where ulong is assumed to be 64bit. The next patch will fix them up. Note that although the code is compiled it isn't used yet and this patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	a4106eae65	x86, NUMA: Move NUMA init logic from numa_64.c to numa.c Move the generic 64bit NUMA init machinery from numa_64.c to numa.c. * node_data[], numa_mem_info and numa_distance * numa_add_memblk[_to](), numa_remove_memblk[_from]() * numa_set_distance() and friends * numa_init() and all the numa_meminfo handling helpers called from it * dummy_numa_init() * memory_add_physaddr_to_nid() A new function x86_numa_init() is added and the content of numa_64.c::initmem_init() is moved into it. initmem_init() now simply calls x86_numa_init(). Constants and numa_off declaration are moved from numa_{32\|64}.h to numa.h. This is code reorganization and doesn't involve any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	299a180aec	x86-32, NUMA: Update numaq to use new NUMA init protocol Update numaq such that it calls numa_add_memblk() and sets numa_nodes_parsed instead of directly diddling with NUMA states. The original get_memcfg_numaq() is renamed to numaq_numa_init() and new get_memcfg_numaq() is created in numa_32.c. The shim numa_add_memblk() implementation handles node_start/end_pfn[] and node_set_online() for nodes with memory. The new get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other than calling the numaq init function. Things get_memcfgs_numaq() do are not strictly necessary for numaq but added for consistency and to help unifying NUMA init handling. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	5acd91ab83	x86-32, NUMA: Replace srat_32.c with srat.c SRAT support implementation in srat_32.c and srat.c are generally similar; however, there are some differences. First of all, 64bit implementation supports more types of SRAT entries. 64bit supports x2apic, affinity, memory and SLIT. 32bit only supports processor and memory. Most other differences stem from different initialization protocols employed by 64bit and 32bit NUMA init paths. On 64bit, * Mappings among PXM, node and apicid are directly done in each SRAT entry callback. * Memory affinity information is passed to numa_add_memblk() which takes care of all interfacing with NUMA init. * Doesn't directly initialize NUMA configurations. All the information is recorded in numa_nodes_parsed and memblks. On 32bit, * Checks numa_off. * Things go through one more level of indirection via private tables but eventually end up initializing the same mappings. * node_start/end_pfn[] are initialized and memblock_x86_register_active_regions() is called for each memory chunk. * node_set_online() is called for each online node. * sort_node_map() is called. There are also other minor differences in sanity checking and messages but taking 64bit version should be good enough. This patch drops the 32bit specific implementation and makes the 64bit implementation common for both 32 and 64bit. The init protocol differences are dealt with in two places - the numa_add_memblk() shim added in the previous patch and new temporary numa_32.c:get_memcfg_from_srat() which wraps invocation of x86_acpi_numa_init(). The shim numa_add_memblk() handles the folowings. * node_start/end_pfn[] initialization. * node_set_online() for memory nodes. * Invocation of memblock_x86_register_active_regions(). The shim get_memcfg_from_srat() handles the followings. * numa_off check. * node_set_online() for CPU nodes. * sort_node_map() invocation. * Clearing of numa_nodes_parsed and active_ranges on failure. The shims are temporary and will be removed as the generic NUMA init path in 32bit is replaced with 64bit one. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	b0d310801a	x86-32, NUMA: implement temporary NUMA init shims To help transition to common NUMA init, implement temporary 32bit shims for numa_add_memblk() and numa_set_distance(). numa_add_memblk() registers the memblk and adjusts node_start/end_pfn[]. numa_set_distance() is noop. These shims will allow using 64bit NUMA init functions on 32bit and gradual transition to common NUMA init path. For detailed description, please read description of commits which make use of the shim functions. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	e6df595b37	x86, NUMA: Move numa_nodes_parsed to numa.[hc] Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for NUMA init path unification. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	daf4f480ae	x86-32, NUMA: Move get_memcfg_numa() into numa_32.c There's no reason get_memcfg_numa() to be implemented inline in mmzone_32.h. Move it to numa_32.c and also make get_memcfg_numa_flag() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:53 +02:00
Tejun Heo	eca9ad3132	x86, NUMA: make srat.c 32bit safe Make srat.c 32bit safe by removing the assumption that unsigned long is 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	7b2600f8ee	x86, NUMA: rename srat_64.c to srat.c Rename srat_64.c to srat.c. This is to prepare for unification of NUMA init paths between 32 and 64bit. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	1201e10a09	x86, NUMA: trivial cleanups * Kill no longer used struct bootnode. * Kill dangling declaration of pxm_to_nid() in numa_32.h. * Make setup_node_bootmem() static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	797390d855	x86-32, NUMA: use sparse_memory_present_with_active_regions() Instead of calling memory_present() for each region from NUMA init, call sparse_memory_present_with_active_regions() from paging_init() similarly to x86-64. For flat and numaq, this results in exactly the same memory_present() calls. For srat, if there are multiple memory chunks for a node, after this change, memory_present() will be called separately for each chunk instead of being called once to encompass the whole range, which doesn't cause any harm and actually is the better behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	6bd262731b	x86, NUMA: Unify 32/64bit numa_cpu_node() implementation Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is NUMAQ which returns valid mapping only after CPU is initialized during SMP bringup; thus, the previous patch to set apicid -> node in setup_local_APIC() makes __apicid_to_node[] always contain the correct mapping whether custom apic->x86_32_numa_cpu_node() is used or not. So, there is no reason to keep separate 32bit implementation. We can always consult __apicid_to_node[]. Move 64bit implementation from numa_64.c to numa.c and remove 32bit implementation from numa_32.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:52 +02:00
Tejun Heo	acd26d611e	x86-64, NUMA: simplify nodedata allocation With top-down memblock allocation, the allocation range limits in ealry_node_mem() can be simplified - try node-local first, then any node but in any case don't allocate below DMA limit. Remove early_node_mem() and implement simplified allocation directly in setup_node_bootmem(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:51 +02:00
Tejun Heo	ebe685f24e	x86-64, NUMA: trivial cleanups for setup_node_bootmem() Make the following trivial changes in preparation for further updates. * nodeid -> nid, nid -> tnid * use nd_ prefix for nodedata related variables * remove start/end_pfn and use start/end directly Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:51 +02:00
Tejun Heo	9688678a66	x86-64, NUMA: Simplify hotadd memory handling The only special handling NUMA needs to do for hotadd memory is determining the node for the hotadd memory given the address of it and there's nothing specific to specific config method used. srat_64.c does somewhat elaborate error checking on ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements memory_add_physaddr_to_nid() which determines the node for given hotadd address. This is almost completely redundant. All the information is already available to the generic NUMA code which already performs all the sanity checking and merging. All that's necessary is not using __initdata from numa_meminfo and providing a function which uses it to map address to node. Drop the specific implementation from srat_64.c and add generic memory_add_physaddr_to_nid() in numa_64.c, which is enabled if CONFIG_MEMORY_HOTPLUG is set. Other than dropping the code, srat_64.c doesn't need any change as it already calls numa_add_memblk() for hot pluggable regions which is enough. While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the same. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com>	2011-05-02 14:18:51 +02:00
Tejun Heo	ba67cf5cf2	Merge branch 'x86/urgent' into x86-mm Merge reason: Pick up the following two fix commits. `2be19102b7`: x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() `765af22da8`: x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change Scheduled NUMA init 32/64bit unification changes depend on these. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 14:16:47 +02:00
Tejun Heo	aff364860a	Merge branch 'x86/numa' into x86-mm Merge reason: Pick up x86-32 remap allocator cleanup changes - 14 commits, 3fe14ab541^..993ba1585c. `3fe14ab541`: x86-32, numa: Fix failure condition check in alloc_remap() `993ba1585c`: x86-32, numa: Update remap allocator comments Scheduled NUMA init 32/64bit unification changes depend on them. Signed-off-by: Tejun Heo <tj@kernel.org>	2011-05-02 14:08:47 +02:00
Yinghai Lu	2be19102b7	x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() numa_cleanup_meminfo() trims each memblk between low (0) and high (max_pfn) limits and discards empty ones. However, the emptiness detection incorrectly used equality test. If the start of a memblk is higher than max_pfn, it is empty but fails the equality test and doesn't get discarded. The condition triggers when max_pfn is lower than start of a NUMA node and results in memory misconfiguration - leading to WARN_ON()s and other funnies. The bug was discovered in devel branch where 32bit too uses this code path for NUMA init. If a node is above the addressing limit, max_pfn ends up lower than the node triggering this problem. The failure hasn't been observed on x86-64 but is still possible with broken hardware e820/NUMA info. As the fix is very low risk, it would be better to apply it even for 64bit. Fix it by using >= instead of ==. Signed-off-by: Yinghai Lu <yinghai@kernel.org> [ Extracted the actual fix from the original patch and rewrote patch description. ] Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-05-01 19:15:11 +02:00
Tim Gardner	c7a7b814c9	ioremap: Delay sanity check until after a successful mapping While tracking down the reason for an ioremap() failure I was distracted by the WARN_ONCE() in __ioremap_caller(). Performing a WARN_ONCE() sanity check before the mapping is successful seems pointless if the caller sends bad values. A case in point is when the BIOS provides erroneous screen_info values causing vesafb_probe() to request an outrageuous size. The WARN_ONCE is then wasted on bogosity. Move the warning to a point where the mapping has been successfully allocated. Addresses: http://bugs.launchpad.net/bugs/772042 Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Link: http://lkml.kernel.org/r/4DB99D2E.9080106@canonical.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-29 08:02:47 +02:00
David Rientjes	7a6c654782	x86, numa: Fix cpu nodemasks for NUMA emulation and CONFIG_DEBUG_PER_CPU_MAPS The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y when NUMA emulation is enabled is currently broken because it does not iterate through every emulated node and bind cpus that have affinity to it. NUMA emulation should bind each cpu to every local node to accurately represent the true NUMA topology of the underlying machine. debug_cpumask_set_cpu() needs to be fixed at the same time so that the debugging information that it emits shows the new cpumask of the node being assigned when the cpu is being added or removed. It can now take responsibility of setting or clearing the cpu itself to remove the need for duplicate code. Also change its last parameter, "enable", to have the correct bool type since it can only be true or false. -v2: Fix the return statements, by Kosaki Motohiro Acked-and-Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-21 11:31:00 +02:00
Tejun Heo	993ba1585c	x86-32, numa: Update remap allocator comments Now that remap allocator is cleaned up, update comments such that they are in docbook function description format and reflect the actual implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-15-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:56 -07:00
Tejun Heo	198bd06bbf	x86-32, numa: Remove redundant node_remap_size[] Remap area size can be determined from node_remap_start_vaddr[] and node_remap_end_vaddr[] making node_remap_size[] redundant. Remove it. While at it, make resume_map_numa_kva() use @nr_pages for number of pages instead of @size. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-14-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:50 -07:00
Tejun Heo	1d85b61baf	x86-32, numa: Remove now useless node_remap_offset[] With lowmem address reservation moved into init_alloc_remap(), node_remap_offset[] is no longer useful. Remove it and related offset handling code. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-13-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:44 -07:00
Tejun Heo	b2e3e4fa3e	x86-32, numa: Make pgdat allocation use alloc_remap() pgdat allocation is handled differnetly from other remap allocations - it's reserved during initialization. There's no reason to handle this any differnetly. Remap allocator is initialized for every node and if init failed the allocation will fail and pgdat allocation can fall back to generic code like anyone else. Remove special init-time pgdat reservation and make allocate_pgdat() use alloc_remap() like everyone else. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-12-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:39 -07:00
Tejun Heo	2a286344f0	x86-32, numa: Move remapping for remap allocator into init_alloc_remap() There's no reason to perform the actual remapping separately. Collapse remap_numa_kva() into init_alloc_remap() and, while at it, make it less verbose. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-11-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:33 -07:00
Tejun Heo	0e9f93c1c0	x86-32, numa: Move lowmem address space reservation to init_alloc_remap() Remap alloc init is done in the following stages. 1. init_alloc_remap() calculates how much memory is necessary for each node and reserves node local memory. 2. initmem_init() collects how much each node needs and reserves a single contiguous lowmem area which can contain all. 3. init_remap_allocator() initializes allocator parameters from the determined lowmem address and per-node offsets. 4. Actual remap happens. There is no reason for the lowmem remap area to be reserved as a single contiguous area at one go. They don't interact with each other and the memblock allocator will put them side-by-side anyway. This patch breaks up the single lowmem address reservation and put per-node lowmem address reservation into init_alloc_remap() and initializes allocator parameters directly in the function as all the addresses are determined there. This merges steps 2 and 3 into 1. While at it, remove now largely irrelevant comments in init_alloc_remap(). This change causes the following behavior changes. * Remap lowmem areas are allocated in smaller per-node chunks. * Remap lowmem area reservation failure fail future remap allocations instead of panicking. * Remap allocator initialization is less verbose. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-10-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:27 -07:00
Tejun Heo	82044c328d	x86-32, numa: Make init_alloc_remap() less panicky Remap allocator failure isn't fatal. The callers are required to fall back to regular early memory allocation mechanisms on failure anyway, so there's no reason to panic on remap init failure. Whining and returning are enough. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-9-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:21 -07:00
Tejun Heo	7210cf9217	x86-32, numa: Calculate remap size in common code Only pgdat and memmap use remap area and there isn't much benefit in allowing per-node override. In addition, the use of node_remap_size[] is confusing in that it contains number of bytes before remap initialization and then number of pages afterwards. Move remap size calculation for memap from specific NUMA config implementations to init_alloc_remap() and make node_remap_size[] static. The only behavior difference is that, before this patch, numaq_32 didn't consider max_pfn when calculating the memmap size but it's enforced after this patch, which is the right thing to do. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-8-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:16 -07:00
Tejun Heo	af7c1a6e83	x86-32, numa: Make @size in init_aloc_remap() represent bytes @size variable in init_alloc_remap() is confusing in that it starts as number of bytes as its name implies and then becomes number of pages. Make it consistently represent bytes. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-7-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:11 -07:00
Tejun Heo	c4d4f577d4	x86-32, numa: Rename @node_kva to @node_pa in init_alloc_remap() init_alloc_remap() is about to do more and using _kva suffix for physical address becomes confusing because the function will be handling both physical and virtual addresses. Rename @node_kva to @node_pa. This is trivial rename and doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-6-git-send-email-tj@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:04 -07:00
Tejun Heo	5510db9c1b	x86-32, numa: Reorganize calculate_numa_remap_page() Separate the outer node walking loop and per-node logic from calculate_numa_remap_pages(). The outer loop is collapsed into initmem_init() and the per-node logic is moved into a new function - init_alloc_remap(). The new function name is confusing with the existing init_remap_allocator() and the behavior is the function isn't very clean either at this point, but this is to prepare for further cleanups and it will become prettier. This function doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-5-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:57:01 -07:00
Tejun Heo	5b8443b25c	x86-32, numa: Remove redundant top-down alloc code from remap initialization memblock_find_in_range() now does top-down allocation by default, so there's no reason for its callers to explicitly implement it by gradually lowering the start address. Remove redundant top-down allocation logic from init_meminit() and calculate_numa_remap_pages(). Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-4-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:56:57 -07:00
Tejun Heo	a6c24f7a70	x86-32, numa: Align pgdat size while initializing alloc_remap When pgdat is reserved in init_remap_allocator(), PAGE_SIZE aligned size will be used. Match the size alignment in initialization to avoid allocation failure down the road. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-3-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:56:52 -07:00
Tejun Heo	3fe14ab541	x86-32, numa: Fix failure condition check in alloc_remap() node_remap_{start\|end}_vaddr[] describe [start, end) ranges; however, alloc_remap() incorrectly failed when the current allocation + size equaled the end but it should fail only when it goes over. Fix it. Signed-off-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/1301955840-7246-2-git-send-email-tj@kernel.org Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-04-06 17:56:46 -07:00
Tejun Heo	765af22da8	x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change Commit `d8fc3afc49` (x86, NUMA: Move *_numa_init() invocations into initmem_init()) moved acpi_numa_init() call into NUMA initmem_init() but forgot to update 32bit NUMA init breaking ACPI NUMA configuration for 32bit. acpi_numa_init() call was later moved again to srat_64.c. Match it by adding the call to get_memcfg_from_srat() in srat_32.c. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <20110404100645.GE1420@mtj.dyndns.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-04-04 16:56:33 +02:00
Florian Mickler	711b8c87a5	x86-64, NUMA: Remove unused variable In case !CONFIG_ACPI_NUMA and !CONFIG_AMD_NUMA gcc emits a warning about the unused variable ret. As that variable is in fact not needed I choose to remove it. Signed-off-by: Florian Mickler <florian@mickler.org> LKML-Reference: <1301843624-22364-1-git-send-email-florian@mickler.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-04-04 01:21:00 +02:00
Tejun Heo	052936080c	x86-64, NUMA: Remove custom phys_to_nid() implementation phys_to_nid() maps physical address to NUMA node id. This is implemented by building perfect hash in compute_hash_shift() during initialization. However, with SPARSE memory model, the nid is encoded in page flags. The perfect hash implementation was for DISCONTIG memory model which got removed years ago by `b263295dbf` (x86: 64-bit, make sparsemem vmemmap the only memory model). So, the perfect hash ends up being used only during initialization when the core SPARSE code already provides perfectly acceptable generic early_pfn_to_nid() implementation. Drop phys_to_nid() and use the generic ealry_pfn_to_nid() instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-04-01 11:15:12 +02:00
Linus Torvalds	b81a618dcd	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: deal with races in /proc//{syscall,stack,personality} proc: enable writing to /proc/pid/mem proc: make check_mem_permission() return an mm_struct on success proc: hold cred_guard_mutex in check_mem_permission() proc: disable mem_write after exec mm: implement access_remote_vm mm: factor out main logic of access_process_vm mm: use mm_struct to resolve gate vma's in __get_user_pages mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm mm: arch: make in_gate_area take an mm_struct instead of a task_struct mm: arch: make get_gate_vma take an mm_struct instead of a task_struct x86: mark associated mm when running a task in 32 bit compatibility mode x86: add context tag to mark mm when running a task in 32-bit compatibility mode auxv: require the target to be tracable (or yourself) close race in /proc//environ report errors in /proc//map* sanely pagemap: close races with suid execve make sessionid permissions in /proc//task/ match those in /proc/* fix leaks in path_lookupat() Fix up trivial conflicts in fs/proc/base.c	2011-03-23 20:51:42 -07:00
Stephen Wilson	cae5d39032	mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:55 -04:00
Stephen Wilson	83b964bbf8	mm: arch: make in_gate_area take an mm_struct instead of a task_struct Morally, the question of whether an address lies in a gate vma should be asked with respect to an mm, not a particular task. Moreover, dropping the dependency on task_struct will help make existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:54 -04:00
Stephen Wilson	31db58b3ab	mm: arch: make get_gate_vma take an mm_struct instead of a task_struct Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-23 16:36:54 -04:00
Linus Torvalds	73d5a8675f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: xen: update mask_rw_pte after kernel page tables init changes xen: set max_pfn_mapped to the last pfn mapped x86: Cleanup highmap after brk is concluded Fix up trivial onflict (added header file includes) in arch/x86/mm/init_64.c	2011-03-22 10:41:36 -07:00
Yinghai Lu	e5f15b45dd	x86: Cleanup highmap after brk is concluded Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2011-03-19 11:58:19 -07:00
Linus Torvalds	f2e1fbb5f2	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Flush TLB if PGD entry is changed in i386 PAE mode x86, dumpstack: Correct stack dump info when frame pointer is available x86: Clean up csum-copy_64.S a bit x86: Fix common misspellings x86: Fix misspelling and align params x86: Use PentiumPro-optimized partial_csum() on VIA C7	2011-03-18 10:45:21 -07:00
Shaohua Li	4981d01ead	x86: Flush TLB if PGD entry is changed in i386 PAE mode According to intel CPU manual, every time PGD entry is changed in i386 PAE mode, we need do a full TLB flush. Current code follows this and there is comment for this too in the code. But current code misses the multi-threaded case. A changed page table might be used by several CPUs, every such CPU should flush TLB. Usually this isn't a problem, because we prepopulate all PGD entries at process fork. But when the process does munmap and follows new mmap, this issue will be triggered. When it happens, some CPUs keep doing page faults: http://marc.info/?l=linux-kernel&m=129915020508238&w=2 Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com> Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Mallick Asit K <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm <linux-mm@kvack.org> Cc: stable <stable@kernel.org> LKML-Reference: <1300246649.2337.95.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 11:44:01 +01:00
Lucas De Marchi	0d2eb44f63	x86: Fix common misspellings They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-18 10:39:30 +01:00
Linus Torvalds	a5e6b135bd	Merge branch 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6 * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (50 commits) printk: do not mangle valid userspace syslog prefixes efivars: Add Documentation efivars: Expose efivars functionality to external drivers. efivars: Parameterize operations. efivars: Split out variable registration efivars: parameterize efivars efivars: Make efivars bin_attributes dynamic efivars: move efivars globals into struct efivars drivers:misc: ti-st: fix debugging code kref: Fix typo in kref documentation UIO: add PRUSS UIO driver support Fix spelling mistakes in Documentation/zh_CN/SubmittingPatches firmware: Fix unaligned memory accesses in dmi-sysfs firmware: Add documentation for /sys/firmware/dmi firmware: Expose DMI type 15 System Event Log firmware: Break out system_event_log in dmi-sysfs firmware: Basic dmi-sysfs support firmware: Add DMI entry types to the headers Driver core: convert platform_{get,set}_drvdata to static inline functions Translate linux-2.6/Documentation/magic-number.txt into Chinese ...	2011-03-16 15:05:40 -07:00
Xiao Guangrong	25542c646a	x86, tlb, UV: Do small micro-optimization for native_flush_tlb_others() native_flush_tlb_others() is called from: flush_tlb_current_task() flush_tlb_mm() flush_tlb_page() All these functions disable preemption explicitly, so we can use smp_processor_id() instead of get_cpu() and put_cpu(). Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Cliff Wickman <cpw@sgi.com> LKML-Reference: <4D7EC791.4040003@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-15 08:30:34 +01:00
Ingo Molnar	8460b3e5bc	Merge commit 'v2.6.38' into x86/mm Conflicts: arch/x86/mm/numa_64.c Merge reason: Resolve the conflict, update the branch to .38. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-15 08:29:44 +01:00
Tejun Heo	56396e6823	x86-64, NUMA: Don't call numa_set_distanc() for all possible node combinations during emulation The distance transforming in numa_emulation() used to call numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node combinations regardless of which are enabled. As numa_set_distance() ignores all out-of-bound distance settings, this doesn't cause any problem other than looping unnecessarily many times during boot. However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the code such that it iterates through only the enabled combinations. Yinghai Lu identified the issue and provided an initial patch to address the issue; however, the patch was incorrect in that it didn't build emulated distance table when there's no physical distance table and unnecessarily complex. http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988 Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org>	2011-03-12 11:41:10 +01:00
Andrea Arcangeli	a79e53d856	x86/mm: Fix pgd_lock deadlock It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-10 09:41:57 +01:00
Andrey Vagin	f86268549f	x86/mm: Handle mm_fault_error() in kernel space mm_fault_error() should not execute oom-killer, if page fault occurs in kernel space. E.g. in copy_from_user()/copy_to_user(). This would happen if we find ourselves in OOM on a copy_to_user(), or a copy_from_user() which faults. Without this patch, the kernels hangs up in copy_from_user(), because OOM killer sends SIG_KILL to current process, but it can't handle a signal while in syscall, then the kernel returns to copy_from_user(), reexcute current command and provokes page_fault again. With this patch the kernel return -EFAULT from copy_from_user(). The code, which checks that page fault occurred in kernel space, has been copied from do_sigbus(). This situation is handled by the same way on powerpc, xtensa, tile, ... Signed-off-by: Andrey Vagin <avagin@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201103092322.p29NMNPH001682@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-10 09:41:40 +01:00
Tejun Heo	078a198906	x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation() Undetermined entries in emu_nid_to_phys[] are filled with zero assuming that physical node 0 is always online; however, this might not be true depending on hardware configuration. Find a physical node which is actually online and use it instead. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1103020628210.31626@chino.kir.corp.google.com>	2011-03-04 16:32:37 +01:00
Yinghai Lu	3b28cf32cc	x86, numa: Fix numa_emulation code with memory-less node0 This crash happens on a system that does not have RAM on node0. When numa_emulation is compiled in, and: 1. we boot the system without numa=fake... 2. or we boot the system with numa=fake=128 to make emulation fail we will get: [ 0.076025] ------------[ cut here ]------------ [ 0.080004] kernel BUG at arch/x86/mm/numa_64.c:788! [ 0.080004] invalid opcode: 0000 [#1] SMP [...] need to use early_cpu_to_node() directly, because cpu_to_apicid and apicid_to_node will return node0 that is not onlined. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> LKML-Reference: <4D6ECF72.5010308@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-03-04 15:20:19 +01:00
David Rientjes	c09cedf4f7	x86-64, NUMA: Clean up initmem_init() This patch cleans initmem_init() so that it is more readable and doesn't use an unnecessary array of function pointers to convolute the flow of the code. It also makes it obvious that dummy_numa_init() will always succeed (and documents that requirement) so that the existing BUG() is never actually reached. No functional change. -tj: Updated comment for dummy_numa_init() slightly. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-04 15:17:21 +01:00
Yinghai Lu	51b361b400	x86-64, NUMA: Fix numa_emulation code with node0 without RAM On one system that does not have RAM on node0. When numa_emulation is compiled in, and 1. boot system without numa=fake... 2. or boot system with numa=fake=128 to make emulation fail will get: [ 0.092026] ------------[ cut here ]------------ [ 0.096005] kernel BUG at arch/x86/mm/numa_emulation.c:439! [ 0.096005] invalid opcode: 0000 [#1] SMP [ 0.096005] last sysfs file: [ 0.096005] CPU 0 [ 0.096005] Modules linked in: [ 0.096005] [ 0.096005] Pid: 0, comm: swapper Not tainted 2.6.38-rc6-tip-yh-03869-gcb0491d-dirty #684 Sun Microsystems Sun Fire X4240/Sun Fire X4240 [ 0.096005] RIP: 0010:[<ffffffff81cdc65b>] [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP: 0000:ffffffff82437ed8 EFLAGS: 00010246 ... [ 0.096005] Call Trace: [ 0.096005] [<ffffffff81cd7931>] identify_cpu+0x2d7/0x2df [ 0.096005] [<ffffffff827e54fa>] identify_boot_cpu+0x10/0x30 [ 0.096005] [<ffffffff827e5704>] check_bugs+0x9/0x2d [ 0.096005] [<ffffffff827dceda>] start_kernel+0x3d7/0x3f1 [ 0.096005] [<ffffffff827dc2cc>] x86_64_start_reservations+0x9c/0xa0 [ 0.096005] [<ffffffff827dc4ad>] x86_64_start_kernel+0x1dd/0x1e8 [ 0.096005] Code: 74 06 48 8d 04 90 eb 0f 48 c7 c0 30 d9 00 00 48 03 04 d5 90 0f 60 82 8b 00 83 f8 ff 74 0d 0f a3 05 8b 7e 92 00 19 d2 85 d2 75 02 <0f> 0b 48 98 be 00 01 00 00 48 c7 c7 e0 44 60 82 44 8b 2c 85 e0 [ 0.096005] RIP [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP <ffffffff82437ed8> [ 0.096026] ---[ end trace a7919e7f17c0a725 ]--- We need to use early_cpu_to_node() directly, because numa_cpu_node() will return node0 that is not onlined. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-03-04 14:49:28 +01:00
Tejun Heo	f891125028	x86-64, NUMA: Revert NUMA affine page table allocation This patch reverts NUMA affine page table allocation added by commit `1411e0ec31` (x86-64, numa: Put pgtable to local node memory). The commit made an undocumented change where the kernel linear mapping strictly follows intersection of e820 memory map and NUMA configuration. If the physical memory configuration has holes or NUMA nodes are not properly aligned, this leads to using unnecessarily smaller mapping size which leads to increased TLB pressure. For details, http://thread.gmane.org/gmane.linux.kernel/1104672 Patches to fix the problem have been proposed but the underlying code needs more cleanup and the approach itself seems a bit heavy handed and it has been determined to revert the feature for now and come back to it in the next developement cycle. http://thread.gmane.org/gmane.linux.kernel/1105959 As init_memory_mapping_high() callsites have been consolidated since the commit, reverting is done manually. Also, the RED-PEN comment in arch/x86/mm/init.c is not restored as the problem no longer exists with memblock based top-down early memory allocation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>	2011-03-04 10:26:36 +01:00
Tejun Heo	eb8c1e2c83	x86-64, NUMA: Better explain numa_distance handling Handling of out-of-bounds distances and allocation failure can use better documentation. Add it. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com>	2011-03-02 16:34:21 +01:00
Yinghai Lu	ce0033307f	x86-64, NUMA: Fix distance table handling NUMA distance table handling has the following problems. * numa_reset_distance() uses numa_distance * sizeof(numa_distance[0]) as the table size when it should be using the square of numa_distance. * The same size miscalculation when allocation space for phys_dist in numa_emulation(). * In numa_emulation(), phys_dist must be reserved; otherwise, the new emulated distance table may overlap it. Fix them and, while at it, take numa_distance_cnt resetting in numa_reset_distance() out of the if block to simplify the code a bit. David Rientjes reported incorrect handling of distance table during emulation. -tj: Edited out numa_alloc_distance() related changes which weren't necessary and rewrote patch description. -v2: Ingo was unhappy with 80-column limit induced linebreaks. Let lines run over 80-column. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reported-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: David Rientjes <rientjes@google.com>	2011-03-02 16:34:09 +01:00
David Rientjes	1f565a896e	x86-64, NUMA: Fix size of numa_distance array numa_distance should be sized like the SLIT, an NxN matrix where N is the highest node id + 1. This patch fixes the calculation to avoid overflowing the array on the subsequent iteration. -tj: The original patch used last index to calculate size. Yinghai pointed out it should be incremented so it is the number of elements instead of the last index to calculate the size of the table. Updated accordingly. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-25 10:10:54 +01:00
Yinghai Lu	d1b19426b0	x86: Rename e820_table_* to pgt_buf_* e820_table_{start\|end\|top}, which are used to buffer page table allocation during early boot, are now derived from memblock and don't have much to do with e820. Change the names so that they reflect what they're used for. This patch doesn't introduce any behavior change. -v2: Ingo found that earlier patch "x86: Use early pre-allocated page table buffer top-down" caused crash on 32bit and needed to be dropped. This patch was updated to reflect the change. -tj: Updated commit description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-24 14:52:18 +01:00
Yinghai Lu	2bf50555b0	x86-64, NUMA: Seperate out numa_alloc_distance() from numa_set_distance() Alloc code is much bigger the distance setting. Separate it out into numa_alloc_distance() for readability. -v2: Let alloc_numa_distance to return -ENOMEM on failing path, requested by tj. -tj: Description update. Minor tweaks including function name, location and return value check. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-22 11:18:49 +01:00
Tejun Heo	90e6b677b4	x86-64, NUMA: Add proper function comments to global functions Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com>	2011-02-22 11:10:08 +01:00
Tejun Heo	b8ef9172b2	x86-64, NUMA: Move NUMA emulation into numa_emulation.c Create numa_emulation.c and move all NUMA emulation code there. The definitions of struct numa_memblk and numa_meminfo are moved to numa_64.h. Also, numa_remove_memblk_from(), numa_cleanup_meminfo(), numa_reset_distance() along with numa_emulation() are made global. - v2: Internal declarations moved to numa_internal.h as suggested by Yinghai. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com>	2011-02-22 11:10:08 +01:00
Tejun Heo	fbe99959d1	x86-64, NUMA: Prepare numa_emulation() for moving NUMA emulation into a separate file Update numa_emulation() such that, it - takes @numa_meminfo and @numa_dist_cnt instead of directly referencing the global variables. - copies the distance table by iterating each distance with node_distance() instead of memcpy'ing the distance table. - tests emu_cmdline to determine whether emulation is requested and fills emu_nid_to_phys[] with identity mapping if emulation is not used. This allows the caller to call numa_emulation() unconditionally and makes return value unncessary. - defines dummy version if CONFIG_NUMA_EMU is disabled. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com>	2011-02-22 11:10:08 +01:00
Yinghai Lu	69efcc6d90	x86-64, NUMA: Do not scan two times for setup_node_bootmem() By the time setup_node_bootmem() is called, all the memblocks are already registered. As node_data is allocated from these memblocks, calling it more than once doesn't make any difference. Drop the loop. tj: Dropped comment referencing to the old behavior as suggested by David and rephrased the description. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-21 11:23:31 +01:00
Yinghai Lu	6d496f9f23	x86-64, NUMA: Put dummy_numa_init() in the init section dummy_numa_init() is used only during system boot. Put it in .init like other NUMA init functions. - tj: Description update. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>	2011-02-17 15:04:20 +01:00
Yinghai Lu	2ca230baeb	x86-64, NUMA: Don't call __pa() with invalid address in numa_reset_distance() Do not call __pa(numa_distance) if it was not allocated before. Calling with invalid address triggers VIRTUAL_BUG_ON() in __phys_addr() if CONFIG_DEBUG_VIRTUAL. Also reported by Ingo. http://thread.gmane.org/gmane.linux.kernel/1101306/focus=1101785 - v2: Change to check existing path as tj requested. - tj: Description update. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Ingo Molnar <mingo@elte.hu>	2011-02-17 15:03:43 +01:00
Tejun Heo	e23bba6044	x86-64, NUMA: Unify emulated distance mapping NUMA emulation needs to update node distance information. It did it by remapping apicid to PXM mapping, even when amdtopology is being used. There is no reason to go through such convolution. The generic code has all the information necessary to transform the distance table to the emulated nid space. Implement generic distance table transformation in numa_emulation() and drop private implementations in srat_64 and amdtopology_64. This makes find_node_by_addr() and fake_physnodes() and related functions unnecessary, drop them. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:10 +01:00
Tejun Heo	6b78cb549b	x86-64, NUMA: Unify emulated apicid -> node mapping transformation NUMA emulation changes node mappings and thus apicid -> node mapping needs to be updated accordingly. srat_64 and amdtopology_64 did this separately; however, all the necessary information is the mapping from emulated nodes to physical nodes which is available in emu_nid_to_phys[]. Implement common __apicid_to_node[] transformation in numa_emulation() and drop duplicate implementations. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:10 +01:00
Tejun Heo	1cca534073	x86-64, NUMA: Emulate directly from numa_meminfo NUMA emulation built physnodes[] array which could only represent configurations from the physical meminfo and emulated nodes using the information. There's no reason to take this extra level of indirection. Update emulation functions so that they operate directly on numa_meminfo. This simplifies the code and makes emulation layout behave better with interleaved physical nodes. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:10 +01:00
Tejun Heo	775ee85d7b	x86-64, NUMA: Wrap node ID during emulation Both emulation layout functions - split_nodes[_size]_interleave() - didn't wrap emulated nid while laying out the fake nodes and tried to avoid interating over the specified number of nodes, which is fragile. Now that the emulation code generates numa_meminfo, the node memblks don't need to be consecutive and emulated node IDs can simply wrap. This makes the code more robust and is necessary for updates to better handle the cases where the physical nodes are interleaved. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:10 +01:00
Tejun Heo	c88aea7a70	x86-64, NUMA: Make emulation code build numa_meminfo and share the registration path NUMA emulation code built nodes[] array and had its own registration path to set up the emulated nodes. Update it such that it generates emulated numa_meminfo and returns control to initmem_init() and shares the same registration path with non-emulated cases. Because {acpi\|amd}_fake_nodes() expect nodes[] parameter, fake_physnodes() now generates nodes[] from numa_meminfo. This will go away with further updates. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:10 +01:00
Tejun Heo	9d073caeb3	x86-64, NUMA: Build and use direct emulated nid -> phys nid mapping NUMA emulation copied physical NUMA configuration into physnodes[] and used it to reverse-map emulated nodes to physical nodes, which is unnecessarily convoluted. Build emu_nid_to_phys[] array to map emulated nids directly to the matching physical nids and use it in numa_add_cpu(). physnodes[] will be removed with further patches. - v2: Build failure when CONFIG_DEBUG_PER_CPU_MAPS due to missing local variable definition fixed. Reported by Ingo. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:10 +01:00
Tejun Heo	d9c515eacb	x86-64, NUMA: Trivial changes to prepare for emulation updates * Separate out numa_add_memblk_to() from numa_add_memblk() so that different numa_meminfo can be used. * Rename cmdline to emu_cmdline. * Drop @start/last_pfn from numa_emulation() and use max_pfn directly. This patch doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:10 +01:00
Tejun Heo	ac7136b611	x86-64, NUMA: Implement generic node distance handling Node distance either used direct node comparison, ACPI PXM comparison or ACPI SLIT table lookup. This patch implements generic node distance handling. NUMA init methods can call numa_set_distance() to set distance between nodes and the common __node_distance() implementation will report the set distance. Due to the way NUMA emulation is implemented, the generic node distance handling is used only when emulation is not used. Later patches will update NUMA emulation to use the generic distance mechanism. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	4697bdcc94	x86-64, NUMA: Kill mem_nodes_parsed With all memory configuration information now carried in numa_meminfo, there's no need to keep mem_nodes_parsed separate. Drop it and use numa_nodes_parsed for CPU / memory-less nodes. A new helper numa_nodemask_from_meminfo() is added to calculate memnode mask on the fly which is currently used to set node_possible_map. This simplifies NUMA init methods a bit and removes a source of possible inconsistencies. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	92d4a4371e	x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsed It's no longer necessary to keep both cpu_nodes_parsed and mem_nodes_parsed. In preparation for merge, rename cpu_nodes_parsed to numa_nodes_parsed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	91556237ec	x86-64, NUMA: Kill numa_nodes[] numa_nodes[] doesn't carry any information which isn't present in numa_meminfo. Each entry is simply min/max range of all the memblks for the node. This is not only redundant but also inaccurate when memblks for different nodes interleave - for example, find_node_by_addr() can return the wrong nodeid. Kill numa_nodes[] and always use numa_meminfo instead. * nodes_cover_memory() is renamed to numa_meminfo_cover_memory() and now operations on numa_meminfo and returns bool. * setup_node_bootmem() needs min/max range. Compute the range on the fly. setup_node_bootmem() invocation is restructured to use outer loop instead of hardcoding the double invocations. * find_node_by_addr() now operates on numa_meminfo. * setup_physnodes() builds physnodes[] from memblks. This will go away when emulation code is updated to use struct numa_meminfo. This patch also makes the following misc changes. * Clearing of nodes_add[] clearing is converted to memset(). * numa_add_memblk() in amd_numa_init() is moved down a bit for consistency. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	a844ef46fa	x86-64, NUMA: Add common find_node_by_addr() srat_64.c and amdtopology_64.c had their own versions of find_node_by_addr() which were basically the same. Add common one in numa_64.c and remove the duplicates. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	56e827fbde	x86-64, NUMA: consolidate and improve memblk sanity checks memblk sanity check was scattered around and incomplete. Consolidate and improve. * Confliction detection and cutoff_node() logic are moved to numa_cleanup_meminfo(). * numa_cleanup_meminfo() clears the unused memblks before returning. * Check and warn about invalid input parameters in numa_add_memblk(). * Check the maximum number of memblk isn't exceeded in numa_add_memblk(). * numa_cleanup_meminfo() is now called before numa_emulation() so that the emulation code also uses the cleaned up version. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	2e756be447	x86-64, NUMA: make numa_cleanup_meminfo() prettier * Factor out numa_remove_memblk_from(). * Hole detection doesn't need separate start/end. Calculate start/end once. * Relocate comment. * Define iterators at the top and remove unnecessary prefix increments. This prepares for further improvements to the function. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	f9c60251c3	x86-64, NUMA: Separate out numa_cleanup_meminfo() Separate out numa_cleanup_meminfo() from numa_register_memblks(). node_possible_map initialization is moved to the top of the split numa_register_memblks(). This patch doesn't cause behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:09 +01:00
Tejun Heo	97e7b78d06	x86-64, NUMA: Introduce struct numa_meminfo Arrays for memblks and nodeids and their length lived in separate variables making things unnecessarily cumbersome. Introduce struct numa_meminfo which contains all memory configuration info. This patch doesn't cause any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:08 +01:00
Tejun Heo	8968dab8ad	x86-64, NUMA: Remove %NULL @nodeids handling from compute_hash_shift() numa_emulation() called compute_hash_shift() with %NULL @nodeids which meant identity mapping between index and nodeid. Make numa_emulation() build identity array and drop %NULL @nodeids handling from populate_memnodemap() and thus from compute_hash_shift(). This is to prepare for transition to using memblks instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:08 +01:00
Tejun Heo	5d371b08fe	x86-64, NUMA: Kill {acpi\|amd\|dummy}_scan_nodes() They are empty now. Kill them. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:08 +01:00
Tejun Heo	fd0435d8fb	x86-64, NUMA: Unify the rest of memblk registration Move the remaining memblk registration logic from acpi_scan_nodes() to numa_register_memblks() and initmem_init(). This applies nodes_cover_memory() sanity check, memory node sorting and node_online() checking, which were only applied to acpi, to all init methods. As all memblk registration is moved to common code, active range clearing is moved to initmem_init() too and removed from bad_srat(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:08 +01:00
Tejun Heo	43a662f04f	x86-64, NUMA: Unify use of memblk in all init methods Make both amd and dummy use numa_add_memblk() to describe the detected memory blocks. This allows initmem_init() to call numa_register_memblk() regardless of init method in use. Drop custom memory registration codes from amd and dummy. After this change, memblk merge/cleanup in numa_register_memblks() is applied to all init methods. As this makes compute_hash_shift() and numa_register_memblks() used only inside numa_64.c, make them static. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:08 +01:00
Tejun Heo	ef396ec96c	x86-64, NUMA: Factor out memblk handling into numa_{add\|register}_memblk() Factor out memblk handling from srat_64.c into two functions in numa_64.c. This patch doesn't introduce any behavior change. The next patch will make all init methods use these functions. - v2: Fixed build failure on 32bit due to misplaced NR_NODE_MEMBLKS. Reported by Ingo. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 17:11:07 +01:00
Tejun Heo	1909554870	x86-64, NUMA: Kill {acpi\|amd}_get_nodes() With common numa_nodes[], common code in numa_64.c can access it directly. Copy directly and kill {acpi\|amd}_get_nodes(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:07 +01:00
Tejun Heo	206e42087a	x86-64, NUMA: Use common numa_nodes[] ACPI and amd are using separate nodes[] array. Add numa_nodes[] and use them in all NUMA init methods. cutoff_node() cleanup is moved from srat_64.c to numa_64.c and applied in initmem_init() regardless of init methods. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:07 +01:00
Tejun Heo	45fe6c78c4	x86-64, NUMA: Move apicid to numa mapping initialization from amd_scan_nodes() to amd_numa_init() This brings amd initialization behavior closer to that of acpi. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:07 +01:00
Tejun Heo	99df738cd2	x86-64, NUMA: Remove local variable found from amd_numa_init() Use weight count on mem_nodes_parsed instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:07 +01:00
Tejun Heo	ec8cf29b1d	x86-64, NUMA: Use common {cpu\|mem}_nodes_parsed ACPI and amd are using separate nodes_parsed masks. Add {cpu\|mem}_nodes_parsed and use them in all NUMA init methods. Initialization of the masks and building node_possible_map are now handled commonly by initmem_init(). dummy_numa_init() is updated to set node 0 on both masks. While at it, move the info messages from scan to init. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:07 +01:00
Tejun Heo	ffe77a4605	x86-64, NUMA: Restructure initmem_init() Reorganize initmem_init() such that, * Different NUMA init methods are iterated in a consistent way. * Each iteration re-initializes all the parameters and different method can be tried after a failure. * Dummy init is handled the same as other methods. Apart from how retry after failure, this patch doesn't change the behavior. The call sequences are kept equivalent across the conversion. After the change, bad_srat() doesn't need to clear apic to node mapping or worry about numa_off. Simplified accordingly. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Tejun Heo	d8fc3afc49	x86, NUMA: Move *_numa_init() invocations into initmem_init() There's no reason for these to live in setup_arch(). Move them inside initmem_init(). - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Ankita. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Tejun Heo	a9aec56afa	x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by return value Because of the way ACPI tables are parsed, the generic acpi_numa_init() couldn't return failure when error was detected by arch hooks. Instead, the failure state was recorded and later arch dependent init hook - acpi_scan_nodes() - would fail. Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be indicated as return value immediately. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Tejun Heo	940fed2e79	x86-64, NUMA: Unify {acpi\|amd}_{numa_init\|scan_nodes}() arguments and return values The functions used during NUMA initialization - _numa_init() and _scan_nodes() - have different arguments and return values. Unify them such that they all take no argument and return 0 on success and -errno on failure. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Tejun Heo	86ef4dbf1f	x86, NUMA: Drop @start/last_pfn from initmem_init() initmem_init() extensively accesses and modifies global data structures and the parameters aren't even followed depending on which path is being used. Drop @start/last_pfn and let it deal with @max_pfn directly. This is in preparation for further NUMA init cleanups. - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Yinghai. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Tejun Heo	13081df5dd	x86-64, NUMA: Simplify hotplug node handling in acpi_numa_memory_affinity_init() Hotplug node handling in acpi_numa_memory_affinity_init() was unnecessarily complicated with storing the original nodes[] entry and restoring it afterwards. Simplify it by not modifying the nodes[] entry for hotplug nodes from the beginning. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Tejun Heo	7d36b7bc90	x86-64, NUMA: Make dummy node initialization path similar to non-dummy ones Dummy node initialization in initmem_init() didn't initialize apicid to node mapping and set cpu to node mapping directly by caling numa_set_node(), which is different from non-dummy init paths. Update it such that they behave similarly. Initialize apicid to node mapping and call numa_init_array(). The actual cpu to node mapping is handled by init_cpu_to_node() later. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>	2011-02-16 12:13:06 +01:00
Ingo Molnar	52b8b8d725	Merge branch 'x86/numa' into x86/mm Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-16 09:44:15 +01:00
Ingo Molnar	02ac81a812	Merge branch 'x86/bootmem' into x86/mm Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree and prevent conflicts. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-16 09:43:54 +01:00
David Rientjes	14392fd329	x86, numa: Add error handling for bad cpu-to-node mappings CONFIG_DEBUG_PER_CPU_MAPS may return NUMA_NO_NODE when an early_cpu_to_node() mapping hasn't been initialized. In such a case, it emits a warning and continues without an issue but callers may try to use the return value to index into an array. We can catch those errors and fail silently since a warning has already been emitted. No current user of numa_add_cpu() requires this error checking to avoid a crash, but it's better to be proactive in case a future user happens to have a bug and a user tries to diagnose it with CONFIG_DEBUG_PER_CPU_MAPS. Reported-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> LKML-Reference: <alpine.DEB.2.00.1102071407250.7812@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-14 13:29:27 +01:00
Ingo Molnar	b366801c95	Merge commit 'v2.6.38-rc4' into x86/numa Merge reason: Merge latest fixes before applying new patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-14 13:28:31 +01:00
Shaohua Li	7064d865af	x86: Avoid tlbstate lock if not enough cpus This one isn't related to previous patch. If online cpus are below NUM_INVALIDATE_TLB_VECTORS, we don't need the lock. The comments in the code declares we don't need the check, but a hot lock still needs an atomic operation and expensive, so add the check here. Uses nr_cpu_ids here as suggested by Eric Dumazet. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andi Kleen <andi@firstfloor.org> LKML-Reference: <1295232730.1949.710.camel@sli10-conroe> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-14 13:03:08 +01:00
Ingo Molnar	d2137d5af4	Merge branch 'linus' into x86/bootmem Conflicts: arch/x86/mm/numa_64.c Merge reason: fix the conflict, update to latest -rc and pick up this dependent fix from Yinghai: `e6d2e2b2b1`: memblock: don't adjust size in memblock_find_base() Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-02-14 11:55:18 +01:00
Nathan Fontenot	1dc41aa6d6	memory hotplug: Define memory_block_size_bytes for x86_64 with CONFIG_X86_UV Define a version of memory_block_size_bytes for x86_64 when CONFIG_X86_UV is set. Signed-off-by: Robin Holt <holt@sgi.com> Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2011-02-03 16:08:58 -08:00
Matthieu CASTET	f12d3d04e8	x86, nx: Don't force pages RW when setting NX bits Xen want page table pages read only. But the initial page table (from head_*.S) live in .data or .bss. That was broken by `64edc8ed5f`. There is absolutely no reason to force these pages RW after they have already been marked RO. Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-02-02 16:02:36 -08:00
Tejun Heo	8db78cc4b4	x86: Unify NUMA initialization between 32 and 64bit Now that everything else is unified, NUMA initialization can be unified too. * numa_init_array() and init_cpu_to_node() are moved from numa_64 to numa. * numa_32::initmem_init() is updated to call numa_init_array() and setup_arch() to call init_cpu_to_node() on 32bit too. * x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on 32bit too. This is safe now as numa_init_array() will initialize it early during boot. This makes NUMA mapping fully initialized before setup_per_cpu_areas() on 32bit too and thus makes the first percpu chunk which contains all the static variables and some of dynamic area allocated with NUMA affinity correctly considered. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-17-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Pekka Enberg <penberg@kernel.org>	2011-01-28 14:54:10 +01:00
Tejun Heo	de2d9445f1	x86: Unify node_to_cpumask_map handling between 32 and 64bit x86_32 has been managing node_to_cpumask_map explicitly from map_cpu_to_node() and friends in a rather ugly way. With previous changes, it's now possible to share the code with 64bit. * When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are implemented in numa.c and shared by 32 and 64bit. CONFIG_NUMA_EMU versions still live in numa_64.c. NUMA_EMU's dependency on 64bit is planned to be removed and the above should go away together. * identify_cpu() now calls numa_add_cpu() for 32bit too. This makes the explicit mask management from map_cpu_to_node() unnecessary. * The whole x86_32 specific map_cpu_to_node() chunk is no longer necessary. Dropped. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-16-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com>	2011-01-28 14:54:10 +01:00
Tejun Heo	645a79195f	x86: Unify CPU -> NUMA node mapping between 32 and 64bit Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for CPU -> NUMA node mapping. Replace it with early_percpu variable x86_cpu_to_node_map and share the mapping code with 64bit. * USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too. * x86_cpu_to_node_map and numa_set/clear_node() are moved from numa_64 to numa. For now, on 32bit, x86_cpu_to_node_map is initialized with 0 instead of NUMA_NO_NODE. This is to avoid introducing unexpected behavior change and will be updated once init path is unified. * srat_detect_node() is now enabled for x86_32 too. It calls numa_set_node() and initializes the mapping making explicit cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: penberg@kernel.org Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com>	2011-01-28 14:54:09 +01:00
Tejun Heo	bbc9e2f452	x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit The mapping between cpu/apicid and node is done via apicid_to_node[] on 64bit and apicid_2_node[] + apic->x86_32_numa_cpu_node() on 32bit. This difference makes it difficult to further unify 32 and 64bit NUMA handling. This patch unifies it by replacing both apicid_to_node[] and apicid_2_node[] with __apicid_to_node[] array, which is accessed by two accessors - set_apicid_to_node() and numa_cpu_node(). On 64bit, numa_cpu_node() always consults __apicid_to_node[] directly while 32bit goes through apic->numa_cpu_node() method to allow apic implementations to override it. srat_detect_node() for amd cpus contains workaround for broken NUMA configuration which assumes relationship between APIC ID, HT node ID and NUMA topology. Leave it to access __apicid_to_node[] directly as mapping through CPU might result in undesirable behavior change. The comment is reformatted and updated to note the ugliness. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: David Rientjes <rientjes@google.com>	2011-01-28 14:54:09 +01:00
Tejun Heo	b78aa66b1f	x86: Drop x86_32 MAX_APICID Commit `56d91f13` (x86, acpi: Add MAX_LOCAL_APIC for 32bit) added MAX_LOCAL_APIC for x86_32 but didn't replace MAX_APICID users with it. Convert MAX_APICID users to MAX_LOCAL_APIC and drop MAX_APICID. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Pekka Enberg <penberg@kernel.org> Acked-by: Yinghai Lu <yinghai@kernel.org> Cc: eric.dumazet@gmail.com Cc: yinghai@kernel.org Cc: brgerst@gmail.com Cc: gorcunov@gmail.com Cc: shaohui.zheng@intel.com Cc: rientjes@google.com LKML-Reference: <1295789862-25482-3-git-send-email-tj@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-28 14:54:04 +01:00
Jan Beulich	9032160275	x86: Unify "numa=" command line option handling In order to be able to suppress the use of SRAT tables that 32-bit Linux can't deal with (in one case known to lead to a non-bootable system, unless disabling ACPI altogether), move the "numa=" option handling to common code. Signed-off-by: Jan Beulich <jbeulich@novell.com> Reviewed-by: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Renninger <trenn@suse.de> LKML-Reference: <4D36B581020000780002D0FF@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-19 10:25:18 +01:00
Andrea Arcangeli	8ee53820ed	thp: mmu_notifier_test_young For GRU and EPT, we need gup-fast to set referenced bit too (this is why it's correct to return 0 when shadow_access_mask is zero, it requires gup-fast to set the referenced bit). qemu-kvm access already sets the young bit in the pte if it isn't zero-copy, if it's zero copy or a shadow paging EPT minor fault we relay on gup-fast to signal the page is in use... We also need to check the young bits on the secondary pagetables for NPT and not nested shadow mmu as the data may never get accessed again by the primary pte. Without this closer accuracy, we'd have to remove the heuristic that avoids collapsing hugepages in hugepage virtual regions that have not even a single subpage in use. ->test_young is full backwards compatible with GRU and other usages that don't have young bits in pagetables set by the hardware and that should nuke the secondary mmu mappings when ->clear_flush_young runs just like EPT does. Removing the heuristic that checks the young bit in khugepaged/collapse_huge_page completely isn't so bad either probably but I thought it was worth it and this makes it reliable. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:46 -08:00
Johannes Weiner	f2d6bfe9ff	thp: add x86 32bit support Add support for transparent hugepages to x86 32bit. Share the same VM_ bitflag for VM_MAPPED_COPY. mm/nommu.c will never support transparent hugepages. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:44 -08:00
Andrea Arcangeli	64cc6ae001	thp: bail out gup_fast on splitting pmd Force gup_fast to take the slow path and block if the pmd is splitting, not only if it's none. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:40 -08:00
Andrea Arcangeli	db3eb96f4e	thp: add pmd mangling functions to x86 Add needed pmd mangling functions with symmetry with their pte counterparts. pmdp_splitting_flush() is the only new addition on the pmd_ methods and it's needed to serialize the VM against split_huge_page. It simply atomically sets the splitting bit in a similar way pmdp_clear_flush_young atomically clears the accessed bit. pmdp_splitting_flush() also has to flush the tlb to make it effective against gup_fast, but it wouldn't really require to flush the tlb too. Just the tlb flush is the simplest operation we can invoke to serialize pmdp_splitting_flush() against gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:40 -08:00
Andrea Arcangeli	9180706344	thp: alter compound get_page/put_page Alter compound get_page/put_page to keep references on subpages too, in order to allow __split_huge_page_refcount to split an hugepage even while subpages have been pinned by one of the get_user_pages() variants. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-01-13 17:32:39 -08:00
Linus Torvalds	e691d24e9c	Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, olpc: Speed up device tree creation during boot x86, olpc: Add OLPC device-tree support x86, of: Define irq functions to allow drivers/of/* to build on x86	2011-01-13 10:15:12 -08:00
Ingo Molnar	9adcc4a127	Merge branch 'x86/numa' into x86/urgent Merge reason: Topic is ready for upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-10 09:36:07 +01:00
Ingo Molnar	1c2a48cf65	Merge branch 'linus' into x86/apic-cleanups Conflicts: arch/x86/include/asm/io_apic.h Merge reason: Resolve the conflict, update to a more recent -rc base Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 14:14:15 +01:00
David Rientjes	d906f0eb2f	x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation "x86, numa: Fake node-to-cpumask for NUMA emulation" broke the build when CONFIG_DEBUG_PER_CPU_MAPS is set and CONFIG_NUMA_EMU is not. This is because it is possible to map a cpu to multiple nodes when NUMA emulation is used; the patch required a physical node address table to find those nodes that was only available when CONFIG_NUMA_EMU was enabled. This extracts the common debug functionality to its own function for CONFIG_DEBUG_PER_CPU_MAPS and uses it regardless of whether CONFIG_NUMA_EMU is set or not. NUMA emulation will now iterate over the set of possible nodes for each cpu and call the new debug function whereas only the cpu's node will be used without NUMA emulation enabled. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <alpine.DEB.2.00.1012301053590.12995@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-07 14:09:34 +01:00
Linus Torvalds	4f00b901d4	Merge branch 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-security-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: module: Move RO/NX module protection to after ftrace module update x86: Resume trampoline must be executable x86: Add RO/NX protection for loadable kernel modules x86: Add NX protection for kernel data x86: Fix improper large page preservation	2011-01-06 11:07:33 -08:00
Linus Torvalds	37d9a8c5ea	Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix included-by file reference comments x86, cpu: Only CPU features determine NX capabilities x86, cpu: Call verify_cpu during 32bit CPU startup x86, cpu: Clear XD_DISABLED flag on Intel to regain NX x86, cpu: Rename verify_cpu_64.S to verify_cpu.S	2011-01-06 10:56:02 -08:00
Linus Torvalds	017892c341	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation x86, acpi: Add MAX_LOCAL_APIC for 32bit x86: io_apic: Split setup_ioapic_ids_from_mpc() x86: io_apic: Fix CONFIG_X86_IO_APIC=n breakage x86: apic: Move probe_nr_irqs_gsi() into ioapic_init_mappings() x86: Allow platforms to force enable apic	2011-01-06 10:51:36 -08:00
Linus Torvalds	42cbd8efb0	Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cacheinfo: Cleanup L3 cache index disable support x86, amd-nb: Cleanup AMD northbridge caching code x86, amd-nb: Complete the rename of AMD NB and related code	2011-01-06 10:50:28 -08:00
Yinghai Lu	f005fe12b9	x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping() It is not related to init_memory_mapping(), and init_memory_mapping() is getting more bigger. So make it as seperated function and call it from reserve_brk() and that is point when _brk_end is concluded. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933E0.7090305@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-01-05 14:03:45 +01:00
Ingo Molnar	bc030d6cb9	Merge commit 'v2.6.37-rc8' into x86/apic Conflicts: arch/x86/include/asm/io_apic.h Merge reason: move to a fresh -rc, resolve the conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2011-01-04 09:43:42 +01:00
Yinghai Lu	1411e0ec31	x86-64, numa: Put pgtable to local node memory Introduce init_memory_mapping_high(), and use it with 64bit. It will go with every memory segment above 4g to create page table to the memory range itself. before this patch all page tables was on one node. with this patch, one RED-PEN is killed debug out for 8 sockets system after patch [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] RAMDISK: 7bc84000 - 7f745000 .... [ 0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used [ 0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used [ 0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used [ 0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used [ 0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used [ 0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used [ 0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used [ 0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used [ 0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used [ 0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff] [ 0.000000] 0100000000 - 1080000000 page 2M [ 0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff] [ 0.000000] memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff] [ 0.000000] 1080000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ffc0000-0x207fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00002080000000-0x0000307fffffff] [ 0.000000] 2080000000 - 3080000000 page 2M [ 0.000000] kernel direct mapping tables up to 3080000000 @ [0x307ff3d000-0x307fffffff] [ 0.000000] memblock_x86_reserve_range: [0x307ffc0000-0x307fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00003080000000-0x0000407fffffff] [ 0.000000] 3080000000 - 4080000000 page 2M [ 0.000000] kernel direct mapping tables up to 4080000000 @ [0x407fefd000-0x407fffffff] [ 0.000000] memblock_x86_reserve_range: [0x407ffc0000-0x407fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00004080000000-0x0000507fffffff] [ 0.000000] 4080000000 - 5080000000 page 2M [ 0.000000] kernel direct mapping tables up to 5080000000 @ [0x507febd000-0x507fffffff] [ 0.000000] memblock_x86_reserve_range: [0x507ffc0000-0x507fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00005080000000-0x0000607fffffff] [ 0.000000] 5080000000 - 6080000000 page 2M [ 0.000000] kernel direct mapping tables up to 6080000000 @ [0x607fe7d000-0x607fffffff] [ 0.000000] memblock_x86_reserve_range: [0x607ffc0000-0x607fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00006080000000-0x0000707fffffff] [ 0.000000] 6080000000 - 7080000000 page 2M [ 0.000000] kernel direct mapping tables up to 7080000000 @ [0x707fe3d000-0x707fffffff] [ 0.000000] memblock_x86_reserve_range: [0x707ffc0000-0x707fffffff] PGTABLE [ 0.000000] init_memory_mapping: [0x00007080000000-0x0000807fffffff] [ 0.000000] 7080000000 - 8080000000 page 2M [ 0.000000] kernel direct mapping tables up to 8080000000 @ [0x807fdfc000-0x807fffffff] [ 0.000000] memblock_x86_reserve_range: [0x807ffbf000-0x807fffffff] PGTABLE [ 0.000000] Initmem setup node 0 [0000000000000000-000000107fffffff] [ 0.000000] NODE_DATA [0x0000107ffbd000-0x0000107ffc1fff] [ 0.000000] Initmem setup node 1 [0000001080000000-000000207fffffff] [ 0.000000] NODE_DATA [0x0000207ffbb000-0x0000207ffbffff] [ 0.000000] Initmem setup node 2 [0000002080000000-000000307fffffff] [ 0.000000] NODE_DATA [0x0000307ffbb000-0x0000307ffbffff] [ 0.000000] Initmem setup node 3 [0000003080000000-000000407fffffff] [ 0.000000] NODE_DATA [0x0000407ffbb000-0x0000407ffbffff] [ 0.000000] Initmem setup node 4 [0000004080000000-000000507fffffff] [ 0.000000] NODE_DATA [0x0000507ffbb000-0x0000507ffbffff] [ 0.000000] Initmem setup node 5 [0000005080000000-000000607fffffff] [ 0.000000] NODE_DATA [0x0000607ffbb000-0x0000607ffbffff] [ 0.000000] Initmem setup node 6 [0000006080000000-000000707fffffff] [ 0.000000] NODE_DATA [0x0000707ffbb000-0x0000707ffbffff] [ 0.000000] Initmem setup node 7 [0000007080000000-000000807fffffff] [ 0.000000] NODE_DATA [0x0000807ffba000-0x0000807ffbefff] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933D1.9020609@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 15:48:08 -08:00
Yinghai Lu	dbef7b56d2	x86-64, numa: Allocate memnodemap under max_pfn_mapped We need to access it right way, so make sure that it is mapped already. Prepare to put page table on local node, and nodemap is used before that. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D1933C8.7060105@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 15:48:08 -08:00
Yinghai Lu	4b239f458c	x86-64, mm: Put early page table high While dubug kdump, found current kernel will have problem with crashkernel=512M. It turns out that initial mapping is to 512M, and later initial mapping to 4G (acutally is 2040M in my platform), will put page table near 512M. then initial mapping to 128g will be near 2g. before this patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x1fffc000-0x1fffffff] [ 0.000000] memblock_x86_reserve_range: [0x1fffc000-0x1fffdfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x7bc01000-0x7bc83fff] [ 0.000000] memblock_x86_reserve_range: [0x7bc01000-0x7bc7efff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] crashkernel reservation failed - No suitable area found. after patch: [ 0.000000] initial memory mapped : 0 - 20000000 [ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff] [ 0.000000] 0000000000 - 007f600000 page 2M [ 0.000000] 007f600000 - 007f750000 page 4k [ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff] [ 0.000000] memblock_x86_reserve_range: [0x7f74c000-0x7f74dfff] PGTABLE [ 0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff] [ 0.000000] 0100000000 - 2080000000 page 2M [ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff] [ 0.000000] memblock_x86_reserve_range: [0x207ff7d000-0x207fffafff] PGTABLE [ 0.000000] RAMDISK: 7bc84000 - 7f745000 [ 0.000000] memblock_x86_reserve_range: [0x17000000-0x36ffffff] CRASH KERNEL [ 0.000000] Reserving 512MB of memory at 368MB for crashkernel (System RAM: 133120MB) It means with the patch, page table for [0, 2g) will need 2g, instead of under 512M, page table for [4g, 128g) will be near 128g, instead of under 2g. That would good, if we have lots of memory above 4g, like 1024g, or 2048g or 16T, will not put related page table under 2g. that would be have chance to fill the under 2g if 1G or 2M page is not used. the code change will use add map_low_page() and update unmap_low_page() for 64bit, and use them to get access the corresponding high memory for page table setting. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0C0734.7060900@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-29 14:46:54 -08:00
H. Peter Anvin	d50e8fc7e3	Merge branch 'x86/apic-cleanups' into x86/numa	2010-12-29 11:36:26 -08:00
David Rientjes	a387e95a49	x86, numa: Fix cpu to node mapping for sparse node ids NUMA boot code assumes that physical node ids start at 0, but the DIMMs that the apic id represents may not be reachable. If this is the case, node 0 is never online and cpus never end up getting appropriately assigned to a node. This causes the cpumask of all online nodes to be empty and machines crash with kernel code assuming online nodes have valid cpus. The fix is to appropriately map all the address ranges for physical nodes and ensure the cpu to node mapping function checks all possible nodes (up to MAX_NUMNODES) instead of simply checking nodes 0-N, where N is the number of physical nodes, for valid address ranges. This requires no longer "compressing" the address ranges of nodes in the physical node map from 0-N, but rather leave indices in physnodes[] to represent the actual node id of the physical node. Accordingly, the topology exported by both amd_get_nodes() and acpi_get_nodes() no longer must return the number of nodes to iterate through; all such iterations will now be to MAX_NUMNODES. This change also passes the end address of system RAM (which may be different from normal operation if mem= is specified on the command line) before the physnodes[] array is populated. ACPI parsed nodes are truncated to fit within the address range that respect the mem= boundaries and even some physical nodes may become unreachable in such cases. When NUMA emulation does succeed, any apicid to node mapping that exists for unreachable nodes are given default values so that proximity domains can still be assigned. This is important for node_distance() to function as desired. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221702090.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:16 -08:00
David Rientjes	c1c3443c9c	x86, numa: Fake node-to-cpumask for NUMA emulation It's necessary to fake the node-to-cpumask mapping so that an emulated node ID returns a cpumask that includes all cpus that have affinity to the memory it represents. This is a little intrusive because it requires knowledge of the physical topology of the system. setup_physnodes() gives us that information, but since NUMA emulation ends up altering the physnodes array, it's necessary to reset it before cpus are brought online. Accordingly, the physnodes array is moved out of init.data and into cpuinit.data since it will be needed on cpuup callbacks. This works regardless of whether numa=fake is used on the command line, or the setup of the fake node succeeds or fails. The physnodes array always contains the physical topology of the machine if CONFIG_NUMA_EMU is enabled and can be used to setup the correct node-to-cpumask mappings in all cases since setup_physnodes() is called whenever the array needs to be repopulated with the correct data. To fake the actual mappings, numa_add_cpu() and numa_remove_cpu() are rewritten for CONFIG_NUMA_EMU so that we first find the physical node to which each cpu has local affinity, then iterate through all online nodes to find the emulated nodes that have local affinity to that physical node, and then finally map the cpu to each of those emulated nodes. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221701520.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:15 -08:00
David Rientjes	f51bf3073a	x86, numa: Fake apicid and pxm mappings for NUMA emulation This patch adds the equivalent of acpi_fake_nodes() for AMD Northbridge platforms. The goal is to fake the apicid-to-node mappings for NUMA emulation so the physical topology of the machine is correctly maintained within the kernel. This change also fakes proximity domains for both ACPI and k8 code so the physical distance between emulated nodes is maintained via node_distance(). This exports the correct distances via /sys/devices/system/node/.../distance based on the underlying topology. A new helper function, fake_physnodes(), is introduced to correctly invoke the correct NUMA code to fake these two mappings based on the system type. If there is no underlying NUMA configuration, all cpus are mapped to node 0 for local distance. Since acpi_fake_nodes() is no longer called with CONFIG_ACPI_NUMA, it's prototype can be removed from the header file for such a configuration. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221701360.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:14 -08:00
David Rientjes	4e76f4e67a	x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU Both acpi_get_nodes() and amd_get_nodes() are only necessary when CONFIG_NUMA_EMU is enabled, so avoid compiling them when the option is disabled. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1012221701210.3701@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 15:27:12 -08:00
Yinghai Lu	d3bd058826	x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation Recent Intel new system have different order in MADT, aka will list all thread0 at first, then all thread1. But SRAT table still old order, it will list cpus in one socket all together. If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed to put some cpus apic id to node mapping into apicid_to_node[]. for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash... [ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS). [ 9.235021] divide error: 0000 [#1] SMP [ 9.235315] last sysfs file: [ 9.235481] CPU 1 [ 9.235592] Modules linked in: [ 9.245398] [ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800 [ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 ... [ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623 [ 9.665356] RSP <ffff88103f8d1c40> [ 9.665568] ---[ end trace 2296156d35fdfc87 ]--- So let just parse all cpu entries in SRAT. Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of apicid_to_node[]. it fixes following bug too. https://bugzilla.kernel.org/show_bug.cgi?id=22662 -v2: expand to 32bit according to hpa need to add MAX_LOCAL_APIC for 32bit Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com> Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Tested-by: Myron Stowe <myron.stowe@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4D0AD486.9020704@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-23 13:16:18 -08:00
Ingo Molnar	26e20a108c	Merge commit 'v2.6.37-rc7' into x86/security	2010-12-23 09:48:41 +01:00
Andres Salomon	c10d1e260f	x86, olpc: Add OLPC device-tree support Make use of PROC_DEVICETREE to export the tree, and sparc's PROMTREE code to call into OLPC's Open Firmware to build the tree. v5: fix buglet with root node check (introduced in v4) v4: address some minor style issues pointed out by Grant, and explicitly cast negative phandle checks to s32. v3: rename olpc_prom to olpc_dt - rework Kconfig entries - drop devtree build hook from proc, instead adding a call to x86's paging_init (similarly to how sparc64 does it) - switch allocation from using slab to alloc_bootmem. this allows the DT to be built earlier during boot (during setup_arch); the downside is that there are some 1200 bootmem reservations that are done during boot. Not ideal.. - add a helper olpc_ofw_is_installed function to test for the existence and successful detection of OLPC's OFW. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <20101116220952.26526a80@queued.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-12-15 17:11:30 -08:00
Yinghai Lu	f115714163	x86, apic: Remove early_init_lapic_mapping() It is almost the same as smp_register_lapic_addr(). We just need to let smp_read_mpc() call smp_register_lapic_addr() when early==1. Add the apic_printk to smp_register_lapic_address() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> LKML-Reference: <4CFDF681.3030509@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-12-09 21:52:04 +01:00
Ingo Molnar	10a18d7dc0	Merge commit 'v2.6.37-rc5' into perf/core Merge reason: Pick up the latest -rc. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-12-07 07:49:51 +01:00
Lin Ming	691513f70d	x86: Resume trampoline must be executable commit 5bd5a452(x86: Add NX protection for kernel data) marked the trampoline area NX - which unsurprisingly breaks resume and cpu hotplug. Revert the portion of that commit, which touches the trampoline. Originally-from: Lin Ming <ming.m.lin@intel.com> LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com> Cc: Matthieu Castet <castet.matthieu@free.fr> Cc: Siarhei Liakh <sliakh.lkml@gmail.com> Cc: Xuxian Jiang <jiang@cs.ncsu.edu> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Tested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-11-22 14:38:52 +01:00
Ingo Molnar	ae51ce9061	Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/core	2010-11-18 20:07:12 +01:00
Hans Rosenfeld	eec1d4fa00	x86, amd-nb: Complete the rename of AMD NB and related code Not only the naming of the files was confusing, it was even more so for the function and variable names. Renamed the K8 NB and NUMA stuff that is also used on other AMD platforms. This also renames the CONFIG_K8_NUMA option to CONFIG_AMD_NUMA and the related file k8topology_64.c to amdtopology_64.c. No functional changes intended. Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>	2010-11-18 15:53:04 +01:00
Soeren Sandmann Pedersen	9c0729dc80	x86: Eliminate bp argument from the stack tracing routines The various stack tracing routines take a 'bp' argument in which the caller is supposed to provide the base pointer to use, or 0 if doesn't have one. Since bp is garbage whenever CONFIG_FRAME_POINTER is not defined, this means all callers in principle should either always pass 0, or be conditional on CONFIG_FRAME_POINTER. However, there are only really three use cases for stack tracing: (a) Trace the current task, including IRQ stack if any (b) Trace the current task, but skip IRQ stack (c) Trace some other task In all cases, if CONFIG_FRAME_POINTER is not defined, bp should just be 0. If it _is_ defined, then - in case (a) bp should be gotten directly from the CPU's register, so the caller should pass NULL for regs, - in case (b) the caller should should pass the IRQ registers to dump_trace(), - in case (c) bp should be gotten from the top of the task's stack, so the caller should pass NULL for regs. Hence, the bp argument is not necessary because the combination of task and regs is sufficient to determine an appropriate value for bp. This patch introduces a new inline function stack_frame(task, regs) that computes the desired bp. This function is then called from the two versions of dump_stack(). Signed-off-by: Soren Sandmann <ssp@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arjan van de Ven <arjan@infradead.org>, Cc: Frederic Weisbecker <fweisbec@gmail.com>, Cc: Arnaldo Carvalho de Melo <acme@redhat.com>, LKML-Reference: <m3oc9rop28.fsf@dhcp-100-3-82.bos.redhat.com>> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-11-18 14:37:34 +01:00
Matthieu Castet	5bd5a45266	x86: Add NX protection for kernel data This patch expands functionality of CONFIG_DEBUG_RODATA to set main (static) kernel data area as NX. The following steps are taken to achieve this: 1. Linker script is adjusted so .text always starts and ends on a page bound 2. Linker script is adjusted so .rodata always start and end on a page boundary 3. NX is set for all pages from _etext through _end in mark_rodata_ro. 4. free_init_pages() sets released memory NX in arch/x86/mm/init.c 5. bios rom is set to x when pcibios is used. The results of patch application may be observed in the diff of kernel page table dumps: pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc00a0000 640K RW GLB NX pte +0xc00a0000-0xc0100000 384K RW GLB x pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte No pcibios: -- data_nx_pt_before.txt 2009-10-13 07:48:59.000000000 -0400 ++ data_nx_pt_after.txt 2009-10-13 07:26:46.000000000 -0400 0x00000000-0xc0000000 3G pmd ---[ Kernel Mapping ]--- -0xc0000000-0xc0100000 1M RW GLB x pte +0xc0000000-0xc0100000 1M RW GLB NX pte -0xc0100000-0xc03d7000 2908K ro GLB x pte +0xc0100000-0xc0318000 2144K ro GLB x pte +0xc0318000-0xc03d7000 764K ro GLB NX pte -0xc03d7000-0xc0600000 2212K RW GLB x pte +0xc03d7000-0xc0600000 2212K RW GLB NX pte 0xc0600000-0xf7a00000 884M RW PSE GLB NX pmd 0xf7a00000-0xf7bfe000 2040K RW GLB NX pte 0xf7bfe000-0xf7c00000 8K pte The patch has been originally developed for Linux 2.6.34-rc2 x86 by Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>. -v1: initial patch for 2.6.30 -v2: patch for 2.6.31-rc7 -v3: moved all code into arch/x86, adjusted credits -v4: fixed ifdef, removed credits from CREDITS -v5: fixed an address calculation bug in mark_nxdata_nx() -v6: added acked-by and PT dump diff to commit log -v7: minor adjustments for -tip -v8: rework with the merge of "Set first MB as RW+NX" Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Signed-off-by: Matthieu CASTET <castet.matthieu@free.fr> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F82E.60601@free.fr> [ minor cleanliness edits ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 12:52:04 +01:00
matthieu castet	64edc8ed5f	x86: Fix improper large page preservation This patch fixes a bug in try_preserve_large_page() which may result in improper large page preservation and improper application of page attributes to the memory area outside of the original change request. More specifically, the problem manifests itself when set_memory_*() is called for several pages at the beginning of the large page and try_preserve_large_page() erroneously concludes that the change can be applied to whole large page. The fix consists of 3 parts: 1. Addition of "required" protection attributes in static_protections(), so .data and .bss can be guaranteed to stay "RW" 2. static_protections() is now called for every small page within large page to determine compatibility of new protection attributes (instead of just small pages within the requested range). 3. Large page can be preserved only if attribute change is large-page-aligned and covers whole large page. -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2 -v2: Replaced pfn check with address check for kernel rw-data Signed-off-by: Siarhei Liakh <sliakh.lkml@gmail.com> Signed-off-by: Xuxian Jiang <jiang@cs.ncsu.edu> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: James Morris <jmorris@namei.org> Cc: Andi Kleen <ak@muc.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Dave Jones <davej@redhat.com> Cc: Kees Cook <kees.cook@canonical.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CE2F7F3.8030809@free.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 12:52:04 +01:00
Yinghai Lu	9223081f54	x86: Use online node real index in calulate_tbl_offset() Found a NUMA system that doesn't have RAM installed at the first socket which hangs while executing init scripts. bisected it to: \| commit `9329672021` \| Author: Shaohua Li <shaohua.li@intel.com> \| Date: Wed Oct 20 11:07:03 2010 +0800 \| \| x86: Spread tlb flush vector between nodes It turns out when first socket is not online it could have cpus on node1 tlb_offset set to bigger than NUM_INVALIDATE_TLB_VECTORS. That could affect systems like 4 sockets, but socket 2 doesn't have installed, sockets 3 will get too big tlb_offset. Need to use real online node idx. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Shaohua Li <shaohua.li@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <4CDEDE59.40603@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-18 10:10:50 +01:00
Kees Cook	6036f373ea	x86, cpu: Only CPU features determine NX capabilities Fix the NX feature boot warning when NX is missing to correctly reflect that BIOSes cannot disable NX now. Signed-off-by: Kees Cook <kees.cook@canonical.com> LKML-Reference: <1289414154-7829-5-git-send-email-kees.cook@canonical.com> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-11-10 15:43:15 -08:00
Rakib Mullick	cf38d0ba7e	x86, mm: Fix section mismatch in tlb.c Mark tlb_cpuhp_notify as __cpuinit. It's basically a callback function, which is called from __cpuinit init_smp_flash(). So - it's safe. We were warned by the following warning: WARNING: arch/x86/mm/built-in.o(.text+0x356d): Section mismatch in reference from the function tlb_cpuhp_notify() to the function .cpuinit.text:calculate_tlb_offset() The function tlb_cpuhp_notify() references the function __cpuinit calculate_tlb_offset(). This is often because tlb_cpuhp_notify lacks a __cpuinit annotation or the annotation of calculate_tlb_offset is wrong. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <borislav.petkov@amd.com> Cc: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <AANLkTinWQRG=HA9uB3ad0KAqRRTinL6L_4iKgF84coph@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-11-01 10:09:07 +01:00
Linus Torvalds	2d10d8737c	Merge branches 'x86-fixes-for-linus' and 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, alternative: Call stop_machine_text_poke() on all cpus x86-32: Restore irq stacks NUMA-aware allocations x86, memblock: Fix early_node_mem with big reserved region. * 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, uv: More Westmere support on SGI UV x86, uv: Enable Westmere support on SGI UV	2010-10-29 18:58:00 -07:00
Yinghai Lu	419db274be	x86, memblock: Fix early_node_mem with big reserved region. Xen can reserve huge amounts of memory for pre-ballooning, but that still shows as RAM in the e820 memory map. early_node_mem could not find range because of start/end adjusting, and will go through the fallback path. However, the fallback patch is still using memblock_x86_find_range_node(), and it is partially top-down because it go through active_range entries from low to high. Let's use memblock_find_in_range instead memblock_x86_find_range_node. So get real top down in fallback path. We may still need to make memblock_x86_find_range_node to do overall top_down work. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Jeremy Fitzhardinge <jeremy@goop.org> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CC9A9C9.8020700@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-28 15:52:36 -07:00
Zimny Lech	61d8e11e51	Remove duplicate includes from many files Signed-off-by: Zimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:18 -07:00
Peter Zijlstra	20273941f2	mm: fix race in kunmap_atomic() Christoph reported a nice splat which illustrated a race in the new stack based kmap_atomic implementation. The problem is that we pop our stack slot before we're completely done resetting its state -- in particular clearing the PTE (sometimes that's CONFIG_DEBUG_HIGHMEM). If an interrupt happens before we actually clear the PTE used for the last slot, that interrupt can reuse the slot in a dirty state, which triggers a BUG in kmap_atomic(). Fix this by introducing kmap_atomic_idx() which reports the current slot index without actually releasing it and use that to find the PTE and delay the _pop() until after we're completely done. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Christoph Hellwig <hch@infradead.org> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-27 18:03:05 -07:00
Michel Lespinasse	68da336a14	x86: access_error API cleanup access_error() already takes error_code as an argument, so there is no need for an additional write flag. Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:09 -07:00
Michel Lespinasse	d065bd810b	mm: retry page fault when blocking on disk transfer This change reduces mmap_sem hold times that are caused by waiting for disk transfers when accessing file mapped VMAs. It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call site wants mmap_sem to be released if blocking on a pending disk transfer. In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and do_page_fault() will then re-acquire mmap_sem and retry the page fault. It is expected that the retry will hit the same page which will now be cached, and thus it will complete with a low mmap_sem hold time. Tests: - microbenchmark: thread A mmaps a large file and does random read accesses to the mmaped area - achieves about 55 iterations/s. Thread B does mmap/munmap in a loop at a separate location - achieves 55 iterations/s before, 15000 iterations/s after. - We are seeing related effects in some applications in house, which show significant performance regressions when running without this change. [akpm@linux-foundation.org: fix warning & crash] Signed-off-by: Michel Lespinasse <walken@google.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Ying Han <yinghan@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:09 -07:00
Peter Zijlstra	3e4d3af501	mm: stack based kmap_atomic() Keep the current interface but ignore the KM_type and use a stack based approach. The advantage is that we get rid of crappy code like: #define __KM_PTE \ (in_nmi() ? KM_NMI_PTE : \ in_irq() ? KM_IRQ_PTE : \ KM_PTE0) and in general can stop worrying about what context we're in and what kmap slots might be appropriate for that. The downside is that FRV kmap_atomic() gets more expensive. For now we use a CPP trick suggested by Andrew: #define kmap_atomic(page, args...) __kmap_atomic(page) to avoid having to touch all kmap_atomic() users in a single patch. [ not compiled on: - mn10300: the arch doesn't actually build with highmem to begin with ] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix up drivers/gpu/drm/i915/intel_overlay.c] Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Airlie <airlied@linux.ie> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-10-26 16:52:08 -07:00
Linus Torvalds	f1ebdd60cc	Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (22 commits) Add _addr_lsb field to ia64 siginfo Fix migration.c compilation on s390 HWPOISON: Remove retry loop for try_to_unmap HWPOISON: Turn addr_valid from bitfield into char HWPOISON: Disable DEBUG by default HWPOISON: Convert pr_debugs to pr_info HWPOISON: Improve comments in memory-failure.c x86: HWPOISON: Report correct address granuality for huge hwpoison faults Encode huge page size for VM_FAULT_HWPOISON errors Fix build error with !CONFIG_MIGRATION hugepage: move is_hugepage_on_freelist inside ifdef to avoid warning Clean up __page_set_anon_rmap HWPOISON, hugetlb: fix unpoison for hugepage HWPOISON, hugetlb: soft offlining for hugepage HWPOSION, hugetlb: recover from free hugepage error when !MF_COUNT_INCREASED hugetlb: move refcounting in hugepage allocation inside hugetlb_lock HWPOISON, hugetlb: add free check to dequeue_hwpoison_huge_page() hugetlb: hugepage migration core hugetlb: redefine hugepage copy functions hugetlb: add allocate function for hugepage migration ...	2010-10-26 10:13:10 -07:00
Linus Torvalds	10f2a2b0f6	Merge branch 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-trampoline-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, mm: Add an initial page table for core bootstrapping	2010-10-22 20:37:50 -07:00
Andi Kleen	46e387bbd8	Merge branch 'hwpoison-hugepages' into hwpoison Conflicts: mm/memory-failure.c	2010-10-22 17:40:48 +02:00
Linus Torvalds	3044100e58	Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits) x86-64: Only set max_pfn_mapped to 512 MiB if we enter via head_64.S xen: Cope with unmapped pages when initializing kernel pagetable memblock, bootmem: Round pfn properly for memory and reserved regions memblock: Annotate memblock functions with __init_memblock memblock: Allow memblock_init to be called early memblock/arm: Fix memblock_region_is_memory() typo x86, memblock: Remove __memblock_x86_find_in_range_size() memblock: Fix wraparound in find_region() x86-32, memblock: Make add_highpages honor early reserved ranges x86, memblock: Fix crashkernel allocation arm, memblock: Fix the sparsemem build memblock: Fix section mismatch warnings powerpc, memblock: Fix memblock API change fallout memblock, microblaze: Fix memblock API change fallout x86: Remove old bootmem code x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve x86: Remove not used early_res code x86, memblock: Replace e820_/_early string with memblock_ x86: Use memblock to replace early_res x86, memblock: Use memblock_debug to control debug message print out ... Fix up trivial conflicts in arch/x86/kernel/setup.c and kernel/Makefile	2010-10-21 18:52:11 -07:00
Linus Torvalds	709d9f54cc	Merge branch 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI x86, vmware: Remove deprecated VMI kernel support Fix up trivial #include conflict in arch/x86/kernel/smpboot.c	2010-10-21 13:53:24 -07:00
Linus Torvalds	c3b86a2942	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, percpu: Correct the ordering of the percpu readmostly section x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 \|\| HIGHMEM64G x86: Spread tlb flush vector between nodes percpu: Introduce a read-mostly percpu API x86, mm: Fix incorrect data type in vmalloc_sync_all() x86, mm: Hold mm->page_table_lock while doing vmalloc_sync x86, mm: Fix bogus whitespace in sync_global_pgds() x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation x86, mm: Add RESERVE_BRK_ARRAY() helper mm, x86: Saving vmcore with non-lazy freeing of vmas x86, kdump: Change copy_oldmem_page() to use cached addressing x86, mm: fix uninitialized addr in kernel_physical_mapping_init() x86, kmemcheck: Remove double test x86, mm: Make spurious_fault check explicitly check the PRESENT bit x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions x86, mm: Avoid unnecessary TLB flush	2010-10-21 13:47:29 -07:00
Linus Torvalds	d60a2793ba	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove stale pmtimer_64.c x86, cleanups: Use clear_page/copy_page rather than memset/memcpy x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI x86, cleanup: Remove obsolete boot_cpu_id variable	2010-10-21 13:18:06 -07:00
Linus Torvalds	2f0384e5fc	Merge branch 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-amd-nb-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, amd_nb: Enable GART support for AMD family 0x15 CPUs x86, amd: Use compute unit information to determine thread siblings x86, amd: Extract compute unit information for AMD CPUs x86, amd: Add support for CPUID topology extension of AMD CPUs x86, nmi: Support NMI watchdog on newer AMD CPU families x86, mtrr: Assume SYS_CFG[Tom2ForceMemTypeWB] exists on all future AMD CPUs x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB x86, k8-gart: Decouple handling of garts and northbridges x86, cacheinfo: Fix dependency of AMD L3 CID x86, kvm: add new AMD SVM feature bits x86, cpu: Fix allowed CPUID bits for KVM guests x86, cpu: Update AMD CPUID feature bits x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit x86, AMD: Remove needless CPU family check (for L3 cache info) x86, tsc: Remove CPU frequency calibration on AMD	2010-10-21 13:01:08 -07:00
Linus Torvalds	5d70f79b5e	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (163 commits) tracing: Fix compile issue for trace_sched_wakeup.c [S390] hardirq: remove pointless header file includes [IA64] Move local_softirq_pending() definition perf, powerpc: Fix power_pmu_event_init to not use event->ctx ftrace: Remove recursion between recordmcount and scripts/mod/empty jump_label: Add COND_STMT(), reducer wrappery perf: Optimize sw events perf: Use jump_labels to optimize the scheduler hooks jump_label: Add atomic_t interface jump_label: Use more consistent naming perf, hw_breakpoint: Fix crash in hw_breakpoint creation perf: Find task before event alloc perf: Fix task refcount bugs perf: Fix group moving irq_work: Add generic hardirq context callbacks perf_events: Fix transaction recovery in group_sched_in() perf_events: Fix bogus AMD64 generic TLB events perf_events: Fix bogus context time tracking tracing: Remove parent recording in latency tracer graph options tracing: Use one prologue for the preempt irqs off tracer function tracers ...	2010-10-21 12:54:49 -07:00
Shaohua Li	9329672021	x86: Spread tlb flush vector between nodes Currently flush tlb vector allocation is based on below equation: sender = smp_processor_id() % 8 This isn't optimal, CPUs from different node can have the same vector, this causes a lot of lock contention. Instead, we can assign the same vectors to CPUs from the same node, while different node has different vectors. This has below advantages: a. if there is lock contention, the lock contention is between CPUs from one node. This should be much cheaper than the contention between nodes. b. completely avoid lock contention between nodes. This especially benefits kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity to specific node. In my test, this could reduce > 20% CPU overhead in extreme case.The test machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd to the first CPU of the node. I run a workload with 4 sequential mmap file read thread. The files are empty sparse file. This workload will trigger a lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the extreme tlb flush lock contention because otherwise kswapd keeps migrating between CPUs of a node and I can't get stable result. Sure in real workload, we can't always see so big tlb flush lock contention, but it's possible. [ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 14:44:42 -07:00
Borislav Petkov	b40827fa72	x86-32, mm: Add an initial page table for core bootstrapping This patch adds an initial page table with low mappings used exclusively for booting APs/resuming after ACPI suspend/machine restart. After this, there's no need to add low mappings to swapper_pg_dir and zap them later or create own swsusp PGD page solely for ACPI sleep needs - we have initial_page_table for that. Signed-off-by: Borislav Petkov <bp@alien8.de> LKML-Reference: <20101020070526.GA9588@liondog.tnic> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 14:23:55 -07:00
H. Peter Anvin	d25e6b0b32	Merge branch 'x86/cleanups' into x86/trampoline	2010-10-20 14:22:45 -07:00
H. Peter Anvin	e44dea35cc	Merge branch 'x86/vmware' into x86/trampoline	2010-10-20 13:18:17 -07:00
Borislav Petkov	f01f7c56a1	x86, mm: Fix incorrect data type in vmalloc_sync_all() arch/x86/mm/fault.c: In function 'vmalloc_sync_all': arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast introduced by `617d34d9e5`. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-20 12:54:04 -07:00
Jeremy Fitzhardinge	617d34d9e5	x86, mm: Hold mm->page_table_lock while doing vmalloc_sync Take mm->page_table_lock while syncing the vmalloc region. This prevents a race with the Xen pagetable pin/unpin code, which expects that the page_table_lock is already held. If this race occurs, then Xen can see an inconsistent page type (a page can either be read/write or a pagetable page, and pin/unpin converts it between them), which will cause either the pin or the set_p[gm]d to fail; either will crash the kernel. vmalloc_sync_all() should be called rarely, so this extra use of page_table_lock should not interfere with its normal users. The mm pointer is stashed in the pgd page's index field, as that won't be otherwise used for pgds. Reported-by: Ian Campbell <ian.cambell@eu.citrix.com> Originally-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4CB88A4C.1080305@goop.org> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:57:08 -07:00
Jeremy Fitzhardinge	44235dcde4	x86, mm: Fix bogus whitespace in sync_global_pgds() Whitespace cleanup only. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-19 13:56:03 -07:00
Frederic Weisbecker	ebc8827f75	x86: Barf when vmalloc and kmemcheck faults happen in NMI In x86, faults exit by executing the iret instruction, which then reenables NMIs if we faulted in NMI context. Then if a fault happens in NMI, another NMI can nest after the fault exits. But we don't yet support nested NMIs because we have only one NMI stack. To prevent from that, check that vmalloc and kmemcheck faults don't happen in this context. Most of the other kernel faults in NMIs can be more easily spotted by finding explicit copy_from,to_user() calls on review. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>	2010-10-14 20:43:36 +02:00
Jeremy Fitzhardinge	fef5ba7979	xen: Cope with unmapped pages when initializing kernel pagetable Xen requires that all pages containing pagetable entries to be mapped read-only. If pages used for the initial pagetable are already mapped then we can change the mapping to RO. However, if they are initially unmapped, we need to make sure that when they are later mapped, they are also mapped RO. We do this by knowing that the kernel pagetable memory is pre-allocated in the range e820_table_start - e820_table_end, so any pfn within this range should be mapped read-only. However, the pagetable setup code early_ioremaps the pages to write their entries, so we must make sure that mappings created in the early_ioremap fixmap area are mapped RW. (Those mappings are removed before the pages are presented to Xen as pagetable pages.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> LKML-Reference: <4CB63A80.8060702@goop.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-13 16:07:13 -07:00
H. Peter Anvin	8e4029ee35	Merge branch 'x86/urgent' into core/memblock Reason for merge: Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree. Resolved Conflicts: arch/x86/mm/srat_64.c Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 17:05:11 -07:00
Yinghai Lu	73cf624d02	x86, numa: For each node, register the memory blocks actually used Russ reported SGI UV is broken recently. He said: \| The SRAT table shows that memory range is spread over two nodes. \| \| SRAT: Node 0 PXM 0 100000000-800000000 \| SRAT: Node 1 PXM 1 800000000-1000000000 \| SRAT: Node 0 PXM 0 1000000000-1080000000 \| \|Previously, the kernel early_node_map[] would show three entries \|with the proper node. \| \|[ 0.000000] 0: 0x00100000 -> 0x00800000 \|[ 0.000000] 1: 0x00800000 -> 0x01000000 \|[ 0.000000] 0: 0x01000000 -> 0x01080000 \| \|The problem is recent community kernel early_node_map[] shows \|only two entries with the node 0 entry overlapping the node 1 \|entry. \| \| 0: 0x00100000 -> 0x01080000 \| 1: 0x00800000 -> 0x01000000 After looking at the changelog, Found out that it has been broken for a while by following commit \|commit `8716273cae` \|Author: David Rientjes <rientjes@google.com> \|Date: Fri Sep 25 15:20:04 2009 -0700 \| \| x86: Export srat physical topology Before that commit, register_active_regions() is called for every SRAT memory entry right away. Use nodememblk_range[] instead of nodes[] in order to make sure we capture the actual memory blocks registered with each node. nodes[] contains an extended range which spans all memory regions associated with a node, but that does not mean that all the memory in between are included. Reported-by: Russ Anderson <rja@sgi.com> Tested-by: Russ Anderson <rja@sgi.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CB27BDF.5000800@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Cc: <stable@kernel.org> 2.6.33 .34 .35 .36 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-10-11 15:26:15 -07:00
Andi Kleen	f672b49b07	x86: HWPOISON: Report correct address granuality for huge hwpoison faults An earlier patch fixed the hwpoison fault handling to encode the huge page size in the fault code of the page fault handler. This is needed to report this information in SIGBUS to user space. This is a straight forward patch to pass this information through to the signal handling in the x86 specific fault.c Cc: x86@kernel.org Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: fengguang.wu@intel.com Signed-off-by: Andi Kleen <ak@linux.intel.com>	2010-10-08 09:32:46 +02:00
Ingo Molnar	153db80f8c	Merge commit 'v2.6.36-rc7' into core/memblock Merge reason: Update from -rc3 to -rc7. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-08 09:15:00 +02:00
Yinghai Lu	16c36f743b	x86, memblock: Remove __memblock_x86_find_in_range_size() Fold it into memblock_x86_find_in_range(), and change bad_addr_size() to check_reserve_memblock(). So whole memblock_x86_find_in_range_size() code is more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CAA4DEC.4000401@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:45:43 -07:00
Yinghai Lu	1d931264af	x86-32, memblock: Make add_highpages honor early reserved ranges Originally the only early reserved range that is overlapped with high pages is "KVA RAM", but we already do remove that from the active ranges. However, It turns out Xen could have that kind of overlapping to support memory ballooning.x So we need to make add_highpage_with_active_regions() to subtract memblock reserved just like low ram; this is the proper design anyway. In this patch, refactering get_freel_all_memory_range() to make it can be used by add_highpage_with_active_regions(). Also we don't need to remove "KVA RAM" from active ranges. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4CABB183.1040607@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-10-05 21:44:35 -07:00
Jan Beulich	234bb549ee	x86, cleanups: Use clear_page/copy_page rather than memset/memcpy When operating on whole pages, use clear_page() and copy_page() in favor of memset() and memcpy(); after all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4C7FB8CA0200007800013F51@vpn.id2.novell.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-22 15:36:49 -07:00
Andreas Herrmann	23ac4ae827	x86, k8: Rename k8.[ch] to amd_nb.[ch] and CONFIG_K8_NB to CONFIG_AMD_NB The file names are somehow misleading as the code is not specific to AMD K8 CPUs anymore. The files accomodate code for other AMD CPU northbridges as well. Same is true for the config option which is valid for AMD CPU northbridges in general and not specific to K8. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> LKML-Reference: <20100917160343.GD4958@loge.amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-09-20 14:22:58 -07:00
Francisco Jerez	cc1a8e5233	x86: Fix the address space annotations of iomap_atomic_prot_pfn() This patch fixes the sparse warnings when the return pointer of iomap_atomic_prot_pfn() is used as an argument of iowrite32() and friends. Signed-off-by: Francisco Jerez <currojerez@riseup.net> LKML-Reference: <1283633804-11749-1-git-send-email-currojerez@riseup.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-05 14:26:14 +02:00
Wu Fengguang	1c5f50ee34	x86, mm: fix uninitialized addr in kernel_physical_mapping_init() This re-adds the lost chunk in commit `9b861528a8`. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Haicheng Li <haicheng.li@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20100903090407.GA19771@localhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-09-03 11:40:11 +02:00
Ingo Molnar	daab7fc734	Merge commit 'v2.6.36-rc3' into x86/memblock Conflicts: arch/x86/kernel/trampoline.c mm/memblock.c Merge reason: Resolve the conflicts, update to latest upstream. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-31 09:45:46 +02:00
Julia Lawall	9fbaf49c7f	x86, kmemcheck: Remove double test The opcodes 0x2e and 0x3e are tested for in the first Group 2 line as well. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @expression@ expression E; @@ ( * E \|\| ... \|\| E \| * E && ... && E ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegardno@ifi.uio.no> LKML-Reference: <1283010066-20935-5-git-send-email-julia@diku.dk> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-08-30 09:19:28 +02:00
Yinghai Lu	774ea0bcb2	x86: Remove old bootmem code Requested by Ingo, Thomas and HPA. The old bootmem code is no longer necessary, and the transition is complete. Remove it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:14:37 -07:00
Yinghai Lu	6f2a75369e	x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve memblock_memory_size() will return memory size in memblock.memory.region. memblock_free_memory_size() will return free memory size in memblock.memory.region. So We can get exact reseved size in specified range. Set the size right after initmem_init(), because later bootmem API will get area above 16M. (except some fallback). Later after we remove the bootmem, We could call that just before paging_init(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:13:54 -07:00
Yinghai Lu	a9ce6bc151	x86, memblock: Replace e820_/_early string with memblock_ 1.include linux/memblock.h directly. so later could reduce e820.h reference. 2 this patch is done by sed scripts mainly -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:13:47 -07:00
Yinghai Lu	72d7c3b33c	x86: Use memblock to replace early_res 1. replace find_e820_area with memblock_find_in_range 2. replace reserve_early with memblock_x86_reserve_range 3. replace free_early with memblock_x86_free_range. 4. NO_BOOTMEM will switch to use memblock too. 5. use _e820, _early wrap in the patch, in following patch, will replace them all 6. because memblock_x86_free_range support partial free, we can remove some special care 7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill() so adjust some calling later in setup.c::setup_arch() -- corruption_check and mptable_update -v2: Move reserve_brk() early Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range() that could happen We have more then 128 RAM entry in E820 tables, and memblock_x86_fill() could use memblock_find_in_range() to find a new place for memblock.memory.region array. and We don't need to use extend_brk() after fill_memblock_area() So move reserve_brk() early before fill_memblock_area(). -v3: Move find_smp_config early To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable in right place. -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in memblock.reserved already.. use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later. -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit active_region for 32bit does include high pages need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped() -v6: Use current_limit instead -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L -v8: Set memblock_can_resize early to handle EFI with more RAM entries -v9: update after kmemleak changes in mainline Suggested-by: David S. Miller <davem@davemloft.net> Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:12:29 -07:00
Yinghai Lu	301ff3e88e	x86, memblock: Use memblock_debug to control debug message print out Also let memblock_x86_reserve_range/memblock_x86_free_range could print out name if memblock=debug is specified will also print ther name when reserve_memblock_area/free_memblock_area are called. -v2: according to Ingo, put " if (memblock_debug) " in one place Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:11:35 -07:00
Yinghai Lu	e82d42be24	x86, memblock: Add memblock_x86_memory_in_range() It will return memory size in specified range according to memblock.memory.region Try to share some code with memblock_x86_free_memory_in_range() by passing get_free to __memblock_x86_memory_in_range(). -v2: Ben want _in_range in the name instead of size Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:11:30 -07:00
Yinghai Lu	b52c17ce85	x86, memblock: Add memblock_x86_free_memory_in_range() It will return free memory size in specified range. We can not use memory_size - reserved_size here, because some reserved area may not be in the scope of memblock.memory.region. Use memblock.memory.region subtracting memblock.reserved.region to get free range array. then count size of all free ranges. -v2: Ben insist on using _in_range Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:11:16 -07:00
Yinghai Lu	6bcc8176d0	x86, memblock: Add memblock_x86_find_in_range_node() It can be used to find NODE_DATA for numa. Need to make sure early_node_map[] is filled before it is called, otherwise it will fallback to memblock_find_in_range(), with node range. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:57 -07:00
Yinghai Lu	88ba088c18	x86, memblock: Add memblock_x86_register_active_regions() and memblock_x86_hole_size() memblock_x86_register_active_regions() will be used to fill early_node_map, the result will be memblock.memory.region AND numa data memblock_x86_hole_size will be used to find hole size on memblock.memory.region with specified range. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:52 -07:00
Yinghai Lu	4d5cf86ce1	x86, memblock: Add get_free_all_memory_range() get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by free_all_memory_core_early(). It will use early_node_map aka active ranges subtract memblock.reserved to get all free range, and those ranges will convert to slab pages. -v4: increase range size Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:48 -07:00
Yinghai Lu	9dc5d569c1	x86, memblock: Add memblock_x86_reserve_range/memblock_x86_free_range They are wrappers for core versions, which take start/end/name instead of base/size. This will make x86 conversion eaasier. could add more debug print out -v2: change get_max_mapped() to memblock.default_alloc_limit according to Michael Ellerman and Ben change to memblock_x86_reserve_range and memblock_x86_free_range according to Michael Ellerman -v3: call check_and_double after reserve/free, so could avoid to use find_memblock_area. Suggested by Michael Ellerman Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:10:41 -07:00
Yinghai Lu	27de794365	x86, memblock: Add memblock_x86_to_bootmem() memblock_x86_to_bootmem() will reserve memblock.reserved.region in bootmem after bootmem is set up. We can use it to with all arches that support memblock later. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:08:21 -07:00
Yinghai Lu	f88eff74aa	bootmem, x86: Add weak version of reserve_bootmem_generic It will be used memblock_x86_to_bootmem converting It is an wrapper for reserve_bootmem, and x86 64bit is using special one. Also clean up that version for x86_64. We don't need to take care of numa path for that, bootmem can handle it how Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:08:13 -07:00
Yinghai Lu	fb74fb6db9	x86, memblock: Add memblock_x86_find_in_range_size() size is returned according free range. Will be used to find free ranges for early_memtest and memory corruption check Do not mess it up with lib/memblock.c yet. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-08-27 11:08:06 -07:00
Shaohua Li	660a293ea9	x86, mm: Make spurious_fault check explicitly check the PRESENT bit pte_present() returns true even present bit isn't set but _PAGE_PROTNONE (global bit) bit is set. While with CONFIG_DEBUG_PAGEALLOC, free pages have global bit set but present bit clear. This patch makes we could catch free pages access with CONFIG_DEBUG_PAGEALLOC enabled. [ hpa: added a comment in the code as a warning to janitors ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <1280217988.32400.75.camel@sli10-desk.sh.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 16:00:21 -07:00
Haicheng Li	9b861528a8	x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes When memory hotplug-adding happens for a large enough area that a new PGD entry is needed for the direct mapping, the PGDs of other processes would not get updated. This leads to some CPUs oopsing like below when they have to access the unmapped areas. [ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534] [ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243229] irq event stamp: 8538759 [ 1139.243230] hardirqs last enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30 [ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146 [ 1139.243240] softirqs last enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146 [ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34 [ 1139.243249] CPU 0: [ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore [ 1139.243284] Pid: 6534, comm: bash Tainted: G M 2.6.32-haicheng-cpuhp #7 QSSC-S4R [ 1139.243287] RIP: 0010:[<ffffffff810ace35>] [<ffffffff810ace35>] alloc_arraycache+0x35/0x69 [ 1139.243292] RSP: 0018:ffff8802799f9d78 EFLAGS: 00010286 [ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000 [ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010 [ 1139.243300] RBP: ffffffff8100c34e R08: 0000000000000002 R09: 0000000000000000 [ 1139.243303] R10: ffffffff8246dda0 R11: 000000d08246dda0 R12: ffff8802599bfff0 [ 1139.243305] R13: ffff88027904c040 R14: ffff8802799f8000 R15: 0000000000000001 [ 1139.243308] FS: 00007fe81bfe86e0(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 [ 1139.243311] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1139.243313] CR2: ffff8884ffc00000 CR3: 000000026cf2d000 CR4: 00000000000006f0 [ 1139.243316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1139.243318] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 1139.243321] Call Trace: [ 1139.243324] [<ffffffff810ace29>] ? alloc_arraycache+0x29/0x69 [ 1139.243328] [<ffffffff8135004e>] ? cpuup_callback+0x1b0/0x32a [ 1139.243333] [<ffffffff8105385d>] ? notifier_call_chain+0x33/0x5b [ 1139.243337] [<ffffffff810538a4>] ? __raw_notifier_call_chain+0x9/0xb [ 1139.243340] [<ffffffff8134ecfc>] ? cpu_up+0xb3/0x152 [ 1139.243344] [<ffffffff813388ce>] ? store_online+0x4d/0x75 [ 1139.243348] [<ffffffff811e53f3>] ? sysdev_store+0x1b/0x1d [ 1139.243351] [<ffffffff8110589f>] ? sysfs_write_file+0xe5/0x121 [ 1139.243355] [<ffffffff810b539d>] ? vfs_write+0xae/0x14a [ 1139.243358] [<ffffffff810b587f>] ? sys_write+0x47/0x6f [ 1139.243362] [<ffffffff8100b9ab>] ? system_call_fastpath+0x16/0x1b This patch makes sure to always replicate new direct mapping PGD entries to the PGDs of all processes, as well as ensures corresponding vmemmap mapping gets synced. V1: initial code by Andi Kleen. V2: fix several issues found in testing. V3: as suggested by Wu Fengguang, reuse common code of vmalloc_sync_all(). [ hpa: changed pgd_change from int to bool ] Originally-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4FD8.6080100@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 14:02:33 -07:00
Haicheng Li	6afb5157b9	x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <4C6E4ECD.1090607@linux.intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-26 14:02:29 -07:00
Alok Kataria	b0f4c062fb	x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI VMI was the only user of the alloc_pmd_clone hook, given that VMI is now removed we can also remove this hook. Signed-off-by: Alok N Kataria <akataria@vmware.com> LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com> Cc: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-23 17:09:44 -07:00
Linus Torvalds	9605456919	x86: don't send SIGBUS for kernel page faults It's wrong for several reasons, but the most direct one is that the fault may be for the stack accesses to set up a previous SIGBUS. When we have a kernel exception, the kernel exception handler does all the fixups, not some user-level signal handler. Even apart from the nested SIGBUS issue, it's also wrong to give out kernel fault addresses in the signal handler info block, or to send a SIGBUS when a system call already returns EFAULT. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-13 09:49:20 -07:00
Robert Richter	f6e9456c92	x86, cleanup: Remove obsolete boot_cpu_id variable boot_cpu_id is there for historical reasons and was renamed to boot_cpu_physical_apicid in patch: `c70dcb7` x86: change boot_cpu_id to boot_cpu_physical_apicid However, there are some remaining occurrences of boot_cpu_id that are never touched in the kernel and thus its value is always 0. This patch removes boot_cpu_id completely. Signed-off-by: Robert Richter <robert.richter@amd.com> LKML-Reference: <1279731838-1522-8-git-send-email-robert.richter@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-08-12 14:01:38 -07:00
Cesar Eduardo Barros	597781f3e5	kmap_atomic: make kunmap_atomic() harder to misuse kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse" list[1] ("Follow common convention and you'll get it wrong"), except in some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3]. kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes takes a pointer to within the page itself. This seems to once in a while trip people up (the convention they are following is the one from kunmap()). Make it much harder to misuse, by moving it to level 9 on Rusty's list[4] ("The compiler/linker won't let you get it wrong"). This is done by refusing to build if the type of its first argument is a pointer to a struct page. The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck() (which is what you would call in case for some strange reason calling it with a pointer to a struct page is not incorrect in your code). The previous version of this patch was compile tested on x86-64. [1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html [2] In these cases, it is at level 5, "Do it right or it will always break at runtime." [3] At least mips and powerpc look very similar, and sparc also seems to share a common ancestor with both; there seems to be quite some degree of copy-and-paste coding here. The include/asm/highmem.h file for these three archs mention x86 CPUs at its top. [4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html [5] As an aside, could someone tell me why mn10300 uses unsigned long as the first parameter of kunmap_atomic() instead of void *? Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net> Cc: Russell King <linux@arm.linux.org.uk> (arch/arm) Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips) Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300) Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300) Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc) Cc: Helge Deller <deller@gmx.de> (arch/parisc) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc) Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc) Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc) Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86) Cc: Ingo Molnar <mingo@redhat.com> (arch/x86) Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86) Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic) Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list) Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-08-09 20:44:54 -07:00
Linus Torvalds	9faa1e5942	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Ioremap: fix wrong physical address handling in PAT code x86, tlb: Clean up and correct used type x86, iomap: Fix wrong page aligned size calculation in ioremapping code x86, mm: Create symbolic index into address_markers array x86, ioremap: Fix normal ram range check x86, ioremap: Fix incorrect physical address handling in PAE mode x86-64, mm: Initialize VDSO earlier on 64 bits x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages	2010-08-06 10:17:52 -07:00
Linus Torvalds	4aed2fd8e3	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits) tracing/kprobes: unregister_trace_probe needs to be called under mutex perf: expose event__process function perf events: Fix mmap offset determination perf, powerpc: fsl_emb: Restore setting perf_sample_data.period perf, powerpc: Convert the FSL driver to use local64_t perf tools: Don't keep unreferenced maps when unmaps are detected perf session: Invalidate last_match when removing threads from rb_tree perf session: Free the ref_reloc_sym memory at the right place x86,mmiotrace: Add support for tracing STOS instruction perf, sched migration: Librarize task states and event headers helpers perf, sched migration: Librarize the GUI class perf, sched migration: Make the GUI class client agnostic perf, sched migration: Make it vertically scrollable perf, sched migration: Parameterize cpu height and spacing perf, sched migration: Fix key bindings perf, sched migration: Ignore unhandled task states perf, sched migration: Handle ignored migrate out events perf: New migration tool overview tracing: Drop cpparg() macro perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call ... Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c	2010-08-06 09:30:52 -07:00
Jiri Kosina	d790d4d583	Merge branch 'master' into for-next	2010-08-04 15:14:38 +02:00
Marcin Slusarz	cc05152ab7	x86,mmiotrace: Add support for tracing STOS instruction Add support for stos access tracing with mmiotrace. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Nouveau <nouveau@lists.freedesktop.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20100731205101.GA5860@joi.lan> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2010-08-02 01:32:01 +02:00
Yasuaki Ishimatsu	3709c85735	x86: Ioremap: fix wrong physical address handling in PAT code The following two commits fixed a problem that x86 ioremap() doesn't handle physical address higher than 32-bit properly in X86_32 PAE mode. `ffa71f33a8` (x86, ioremap: Fix incorrect physical address handling in PAE mode) `35be1b716a` (x86, ioremap: Fix normal ram range check) But these fixes are not enough, since pat_pagerange_is_ram() in PAT code also has a same problem. This patch fixes it. Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C47DDCF.80300@jp.fujitsu.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-07-29 15:26:11 +02:00
Borislav Petkov	3f8afb77cd	x86, tlb: Clean up and correct used type smp_processor_id() returns an int and not an unsigned long. Also, since the function is small enough, there's no need for a local variable caching its value. No functionality change, just cleanup. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> LKML-Reference: <20100721124705.GA674@aftab> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-21 21:48:15 +02:00
Florian Zumbiehl	468c30f2bb	x86, iomap: Fix wrong page aligned size calculation in ioremapping code x86 early_iounmap(): fix off-by-one error in page alignment of allocation size for sizes where size%PAGE_SIZE==1. Signed-off-by: Florian Zumbiehl <florz@florz.de> LKML-Reference: <201007202219.o6KMJlES021058@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-20 16:56:35 -07:00
Andres Salomon	92851e2fca	x86, mm: Create symbolic index into address_markers array Without this, adding entries into the address_markers array means adding more and more of an #ifdef maze in pt_dump_init(). By using indices, we can keep it a bit saner. Signed-off-by: Andres Salomon <dilinger@queued.net> LKML-Reference: <201007202219.o6KMJkUs021052@imap1.linux-foundation.org> Cc: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-20 16:56:19 -07:00
Pavel Machek	a2531293db	update email address pavel@suse.cz no longer works, replace it with working address. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-07-19 10:56:54 +02:00
Kenji Kaneshige	35be1b716a	x86, ioremap: Fix normal ram range check Check for normal RAM in x86 ioremap() code seems to not work for the last page frame in the specified physical address range. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C1AE6CD.1080704@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-09 11:42:11 -07:00
Kenji Kaneshige	ffa71f33a8	x86, ioremap: Fix incorrect physical address handling in PAE mode Current x86 ioremap() doesn't handle physical address higher than 32-bit properly in X86_32 PAE mode. When physical address higher than 32-bit is passed to ioremap(), higher 32-bits in physical address is cleared wrongly. Due to this bug, ioremap() can map wrong address to linear address space. In my case, 64-bit MMIO region was assigned to a PCI device (ioat device) on my system. Because of the ioremap()'s bug, wrong physical address (instead of MMIO region) was mapped to linear address space. Because of this, loading ioatdma driver caused unexpected behavior (kernel panic, kernel hangup, ...). Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> LKML-Reference: <4C1AE680.7090408@jp.fujitsu.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-07-09 11:42:03 -07:00
Peter Zijlstra	b945d6b255	rbtree: Undo augmented trees performance damage and regression Reimplement augmented RB-trees without sprinkling extra branches all over the RB-tree code (which lives in the scheduler hot path). This approach is 'borrowed' from Fabio's BFQ implementation and relies on traversing the rebalance path after the RB-tree-op to correct the heap property for insertion/removal and make up for the damage done by the tree rotations. For insertion the rebalance path is trivially that from the new node upwards to the root, for removal it is that from the deepest node in the path from the to be removed node that will still be around after the removal. [ This patch also fixes a video driver regression reported by Ali Gholami Rudi - the memtype->subtree_max_end was updated incorrectly. ] Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Ali Gholami Rudi <ali@rudi.ir> Cc: Fabio Checconi <fabio@gandalf.sssup.it> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1275414172.27810.27961.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-07-05 14:43:50 +02:00
Marcin Slusarz	8b8f79b927	x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages After every iounmap mmiotrace has to free kmmio_fault_pages, but it can't do it directly, so it defers freeing by RCU. It usually works, but when mmiotraced code calls ioremap-iounmap multiple times without sleeping between (so RCU won't kick in and start freeing) it can be given the same virtual address, so at every iounmap mmiotrace will schedule the same pages for release. Obviously it will explode on second free. Fix it by marking kmmio_fault_pages which are scheduled for release and not adding them second time. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Tested-by: Marcin Kocielnicki <koriakin@0x04.net> Tested-by: Shinpei KATO <shinpei@il.is.s.u-tokyo.ac.jp> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Stuart Bennett <stuart@freedesktop.org> Cc: Marcin Kocielnicki <koriakin@0x04.net> Cc: nouveau@lists.freedesktop.org Cc: <stable@kernel.org> LKML-Reference: <20100613215654.GA3829@joi.lan> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-06-18 11:30:09 +02:00
Venkatesh Pallipadi	6a4f3b5237	x86, pat: Proper init of memtype subtree_max_end subtree_max_end that was recently added to struct memtype was not getting properly initialized resulting in WARNING: kmemcheck: Caught 64-bit read from uninitialized memory in memtype_rb_augment_cb() reported here https://bugzilla.kernel.org/show_bug.cgi?id=16092 This change fixes the problem. Reported-by: Christian Casteyde <casteyde.christian@free.fr> Tested-by: Christian Casteyde <casteyde.christian@free.fr> Signed-off-by: Venkatesh Pallipadi <venki@google.com> LKML-Reference: <1276217101-11515-1-git-send-email-venki@google.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com>	2010-06-11 14:12:22 -07:00
Akinobu Mita	e565813ab9	x86/mm: Remove unused DBG() macro DBG() macro for CONFIG_DEBUG_PER_CPU_MAPS is unused. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <1274706291-13554-1-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-31 10:01:53 +02:00
Linus Torvalds	021fad8b70	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, cpufeature: Unbreak compile with gcc 3.x x86, pat: Fix memory leak in free_memtype x86, k8: Fix section mismatch for powernowk8_exit() lib/atomic64_test: fix missing include of linux/kernel.h x86: remove last traces of quicklist usage x86, setup: Phoenix BIOS fixup is needed on Dell Inspiron Mini 1012 x86: "nosmp" command line option should force the system into UP mode arch/x86/pci: use kasprintf x86, apic: ack all pending irqs when crashed/on kexec	2010-05-30 09:06:13 -07:00
Linus Torvalds	35926ff5fb	Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()" This reverts commit `0ac0c0d0f8`, which caused cross-architecture build problems for all the wrong reasons. IA64 already added its own version of __node_random(), but the fact is, there is nothing architectural about the function, and the original commit was just badly done. Revert it, since no fix is forthcoming. Requested-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-30 09:00:03 -07:00
Linus Torvalds	c5617b200a	Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits) tracing: Add __used annotation to event variable perf, trace: Fix !x86 build bug perf report: Support multiple events on the TUI perf annotate: Fix up usage of the build id cache x86/mmiotrace: Remove redundant instruction prefix checks perf annotate: Add TUI interface perf tui: Remove annotate from popup menu after failure perf report: Don't start the TUI if -D is used perf: Fix getline undeclared perf: Optimize perf_tp_event_match() perf: Remove more code from the fastpath perf: Optimize the !vmalloc backed buffer perf: Optimize perf_output_copy() perf: Fix wakeup storm for RO mmap()s perf-record: Share per-cpu buffers perf-record: Remove -M perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig ...	2010-05-27 15:23:47 -07:00
Lee Schermerhorn	e534c7c5f8	numa: x86_64: use generic percpu var numa_node_id() implementation x86 arch specific changes to use generic numa_node_id() based on generic percpu variable infrastructure. Back out x86's custom version of numa_node_id() Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Tejun Heo <tj@kernel.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Nick Piggin <npiggin@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:57 -07:00
Jack Steiner	0ac0c0d0f8	cpusets: randomize node rotor used in cpuset_mem_spread_node() Some workloads that create a large number of small files tend to assign too many pages to node 0 (multi-node systems). Part of the reason is that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at node 0 for newly created tasks. This patch changes the rotor to be initialized to a random node number of the cpuset. [akpm@linux-foundation.org: fix layout] [Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration] Signed-off-by: Jack Steiner <steiner@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Paul Menage <menage@google.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Robin Holt <holt@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:44 -07:00
Xiaotian Feng	20413f2716	x86, pat: Fix memory leak in free_memtype Reserve_memtype will allocate memory for new memtype, but in free_memtype, after the memtype erased from rbtree, the memory is not freed. Changes since V1: make rbt_memtype_erase return erased memtype so that it can be freed in free_memtype. [ hpa: not for -stable: 2.6.34 and earlier not affected ] Signed-off-by: Xiaotian Feng <dfeng@redhat.com> LKML-Reference: <1274838670-8731-1-git-send-email-dfeng@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-26 11:26:04 -07:00
Peter Zijlstra	48691ff86d	x86: remove last traces of quicklist usage We still have a stray quicklist header included even though we axed quicklist usage quite a while back. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <201005241913.o4OJDJe9010881@imap1.linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2010-05-24 13:33:31 -07:00
Akinobu Mita	841bca1393	x86/mmiotrace: Remove redundant instruction prefix checks Get rid of the duplicated entries in prefix_codes[] to eliminate redundant checks by skip_prefix(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Pekka Paalanen <pq@iki.fi> LKML-Reference: <1274140110-5841-1-git-send-email-akinobu.mita@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-23 11:02:43 +02:00
Linus Torvalds	59534f7298	Merge branch 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-for-2.6.35' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (207 commits) drm/radeon/kms/pm/r600: select the mid clock mode for single head low profile drm/radeon: fix power supply kconfig interaction. drm/radeon/kms: record object that have been list reserved drm/radeon: AGP memory is only I/O if the aperture can be mapped by the CPU. drm/radeon/kms: don't default display priority to high on rs4xx drm/edid: fix typo in 1600x1200@75 mode drm/nouveau: fix i2c-related init table handlers drm/nouveau: support init table i2c device identifier 0x81 drm/nouveau: ensure we've parsed i2c table entry for INIT_I2C handlers drm/nouveau: display error message for any failed init table opcode drm/nouveau: fix init table handlers to return proper error codes drm/nv50: support fractional feedback divider on newer chips drm/nv50: fix monitor detection on certain chipsets drm/nv50: store full dcb i2c entry from vbios drm/nv50: fix suspend/resume with DP outputs drm/nv50: output calculated crtc pll when debugging on drm/nouveau: dump pll limits entries when debugging is on drm/nouveau: bios parser fixes for eDP boards drm/nouveau: fix a nouveau_bo dereference after it's been destroyed drm/nv40: remove some completed ctxprog TODOs ...	2010-05-21 11:14:52 -07:00
Linus Torvalds	c4fd308ed6	Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Update the page flags for memtype atomically instead of using memtype_lock x86, pat: In rbt_memtype_check_insert(), update new->type only if valid x86, pat: Migrate to rbtree only backend for pat memtype management x86, pat: Preparatory changes in pat.c for bigger rbtree change rbtree: Add support for augmented rbtrees	2010-05-18 09:28:04 -07:00
Linus Torvalds	1f8caa986a	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64: Combine SRAT regions when possible	2010-05-18 09:17:50 -07:00
Eric Anholt	34dc4d4423	Merge remote branch 'origin/master' into drm-intel-next Conflicts: drivers/gpu/drm/i915/i915_dma.c drivers/gpu/drm/i915/i915_drv.h drivers/gpu/drm/radeon/r300.c The BSD ringbuffer support that is landing in this branch significantly conflicts with the Ironlake PIPE_CONTROL fix on master, and requires it to be tested successfully anyway.	2010-05-10 13:36:52 -07:00
David Rientjes	b0c4d952a1	x86: Fix fake apicid to node mapping for numa emulation With NUMA emulation, it's possible for a single cpu to be bound to multiple nodes since more than one may have affinity if allocated on a physical node that is local to the cpu. APIC ids must therefore be mapped to the lowest node ids to maintain generic kernel use of functions such as cpu_to_node() that determine device affinity. For example, if a device has proximity to physical node 1, for instance, and a cpu happens to be mapped to a higher emulated node id 8, the proximity may not be correctly determined by comparison in generic code even though the cpu may be truly local and allocated on physical node 1. When this happens, the true topology of the machine isn't accurately represented in the emulated environment; although this isn't critical to the system's uptime, any generic code that is NUMA aware benefits from the physical topology being accurately represented. This can affect any system that maps multiple APIC ids to a single node and is booted with numa=fake=N where N is greater than the number of physical nodes. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <alpine.DEB.2.00.1005060224140.19473@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-06 12:02:05 +02:00
Ingo Molnar	56f0e74c9c	x86: Fix parse_reservetop() build failure on certain configs Commit `e67a807` ("x86: Fix 'reservetop=' functionality") added a fixup_early_ioremap() call to parse_reservetop() and declared it in io.h. But asm/io.h was only included indirectly - and on some configs not at all, causing a build failure on those configs. Cc: Liang Li <liang.li@windriver.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Wang Chen <wangchen@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-05-03 09:22:19 +02:00
Liang Li	e67a807f3d	x86: Fix 'reservetop=' functionality When specifying the 'reservetop=0xbadc0de' kernel parameter, the kernel will stop booting due to a early_ioremap bug that relates to commit `8827247ff`. The root cause of boot failure problem is the value of 'slot_virt[i]' was initialized in setup_arch->early_ioremap_init(). But later in setup_arch, the function 'parse_early_param' will modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified. The simplest fix might be use __fix_to_virt(idx0) to get updated value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference old value from slot_virt[slot] directly. Changelog since v0: -v1: When reservetop being handled then FIXADDR_TOP get adjusted, Hence check prev_map then re-initialize slot_virt and PMD based on new FIXADDR_TOP. -v2: place fixup_early_ioremap hence call early_ioremap_init in reserve_top_address to re-initialize slot_virt and corresponding PMD when parse_reservertop -v3: move fixup_early_ioremap out of reserve_top_address to make sure other clients of reserve_top_address like xen/lguest won't broken Signed-off-by: Liang Li <liang.li@windriver.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Wang Chen <wangchen@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com> [ fixed three small cleanliness details in fixup_early_ioremap() ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-30 12:19:53 +02:00
Jan Beulich	2e61878698	x86-64: Combine SRAT regions when possible ... i.e. when the hole between two regions isn't occupied by memory on another node. This reduces the memory->node table size, thus reducing cache footprint of lookups, which got increased significantly some time ago, and things go back to how they were before that change on the systems I looked at. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4BCF3230020000780003B3CA@vpn.id2.novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-04-28 17:14:11 -07:00
Robin Holt	1f9cc3cb6a	x86, pat: Update the page flags for memtype atomically instead of using memtype_lock While testing an application using the xpmem (out of kernel) driver, we noticed a significant page fault rate reduction of x86_64 with respect to ia64. For one test running with 32 cpus, one thread per cpu, it took 01:08 for each of the threads to vm_insert_pfn 2GB worth of pages. For the same test running on 256 cpus, one thread per cpu, it took 14:48 to vm_insert_pfn 2 GB worth of pages. The slowdown was tracked to lookup_memtype which acquires the spinlock memtype_lock. This heavily contended lock was slowing down vm_insert_pfn(). With the cmpxchg on page->flags method, both the 32 cpu and 256 cpu cases take approx 00:01.3 seconds to complete. Signed-off-by: Robin Holt <holt@sgi.com> LKML-Reference: <20100423153627.751194346@gulag1.americas.sgi.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com> Cc: Rafael Wysocki <rjw@novell.com> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-04-23 15:57:23 -07:00
Dave Airlie	c2b41276da	Merge branch 'drm-ttm-pool' into drm-core-next * drm-ttm-pool: drm/ttm: using kmalloc/kfree requires including slab.h drm/ttm: include linux/seq_file.h for seq_printf drm/ttm: Add sysfs interface to control pool allocator. drm/ttm: Use set_pages_array_wc instead of set_memory_wc. arch/x86: Add array variants for setting memory to wc caching. drm/nouveau: Add ttm page pool debugfs file. drm/radeon/kms: Add ttm page pool debugfs file. drm/ttm: Add debugfs output entry to pool allocator. drm/ttm: add pool wc/uc page allocator V3	2010-04-20 13:12:28 +10:00
Pauli Nieminen	4f64625412	arch/x86: Add array variants for setting memory to wc caching. Setting single memory pages at a time to wc takes a lot time in cache flush. To reduce number of cache flush set_pages_array_wc and set_memory_array_wc can be used to set multiple pages to WC with single cache flush. This improves allocation performance for wc cached pages in drm/ttm. CC: Suresh Siddha <suresh.b.siddha@intel.com> CC: Venkatesh Pallipadi <venkatesh.pallipadi@gmail.com> Signed-off-by: Pauli Nieminen <suokkos@gmail.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2010-04-06 11:36:06 +10:00
Tejun Heo	336f5899d2	Merge branch 'master' into export-slabh	2010-04-05 11:37:28 +09:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Yinghai Lu	c967da6a0b	x86: Make sure free_init_pages() frees pages on page boundary When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or in a more compact fashion. Example: Allocated new RAMDISK: 00ec2000 - 0248ce57 Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56 The new RAMDISK's end is not page aligned. Last page could be shared with other users. When free_init_pages are called for initrd or .init, the page could be freed and we could corrupt other data. code segment in free_init_pages(): \| for (; addr < end; addr += PAGE_SIZE) { \| ClearPageReserved(virt_to_page(addr)); \| init_page_count(virt_to_page(addr)); \| memset((void *)(addr & ~(PAGE_SIZE-1)), \| POISON_FREE_INITMEM, PAGE_SIZE); \| free_page(addr); \| totalram_pages++; \| } last half page could be used as one whole free page. So page align the boundaries. -v2: make the original initramdisk to be aligned, according to Johannes, otherwise we have the chance to lose one page. we still need to keep initrd_end not aligned, otherwise it could confuse decompressor. -v3: change to WARN_ON instead, suggested by Johannes. -v4: use PAGE_ALIGN, suggested by Johannes. We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN Add comments about assuming ramdisk start is aligned in relocate_initrd(), change to re get ramdisk_image instead of save it to make diff smaller. Add warning for wrong range, suggested by Johannes. -v6: remove one WARN() We need to align beginning in free_init_pages() do not copy more than ramdisk_size, noticed by Johannes Reported-by: Stanislaw Gruszka <sgruszka@redhat.com> Tested-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-03-29 18:55:33 +02:00
Linus Torvalds	15c989d4d1	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, k8 nb: Fix boot crash: enable k8_northbridges unconditionally on AMD systems x86, UV: Fix target_cpus() in x2apic_uv_x.c x86: Reduce per cpu warning boot up messages x86: Reduce per cpu MCA boot up messages x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already	2010-03-13 14:45:49 -08:00
Linus Torvalds	a626b46e17	Merge branch 'x86-bootmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-bootmem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) early_res: Need to save the allocation name in drop_range_partial() sparsemem: Fix compilation on PowerPC early_res: Add free_early_partial() x86: Fix non-bootmem compilation on PowerPC core: Move early_res from arch/x86 to kernel/ x86: Add find_fw_memmap_area Move round_up/down to kernel.h x86: Make 32bit support NO_BOOTMEM early_res: Enhance check_and_double_early_res x86: Move back find_e820_area to e820.c x86: Add find_early_area_size x86: Separate early_res related code from e820.c x86: Move bios page reserve early to head32/64.c sparsemem: Put mem map for one node together. sparsemem: Put usemap for one node together x86: Make 64 bit use early_res instead of bootmem before slab x86: Only call dma32_reserve_bootmem 64bit !CONFIG_NUMA x86: Make early_node_mem get mem > 4 GB if possible x86: Dynamically increase early_res array size x86: Introduce max_early_res and early_res_count ...	2010-03-03 08:15:05 -08:00
Pallipadi, Venkatesh	4daa2a8093	x86, pat: In rbt_memtype_check_insert(), update new->type only if valid new->type should only change when there is a valid ret_type. Otherwise the requested type and return type should be same. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100224214355.GA16431@linux-os.sc.intel.com> Tested-by: Jack Steiner <steiner@sgi.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-03-01 14:28:48 -08:00
Linus Torvalds	1eca9acbf9	Merge branch 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, numa: Remove configurable node size support for numa emulation x86, numa: Add fixed node size option for numa emulation x86, numa: Fix numa emulation calculation of big nodes x86, acpi: Map hotadded cpu to correct node.	2010-02-28 10:39:36 -08:00
Linus Torvalds	46bbffad54	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mm: Unify kernel_physical_mapping_init() API x86, mm: Allow highmem user page tables to be disabled at boot time x86: Do not reserve brk for DMI if it's not going to be used x86: Convert tlbstate_lock to raw_spinlock x86: Use the generic page_is_ram() x86: Remove BIOS data range from e820 Move page_is_ram() declaration to mm.h Generic page_is_ram: use __weak resources: introduce generic page_is_ram()	2010-02-28 10:38:45 -08:00
Linus Torvalds	a7f16d10b5	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Mark atomic irq ops raw for 32bit legacy x86: Merge show_regs() x86: Macroise x86 cache descriptors x86-32: clean up rwsem inline asm statements x86: Merge asm/atomic_{32,64}.h x86: Sync asm/atomic_32.h and asm/atomic_64.h x86: Split atomic64_t functions into seperate headers x86-64: Modify memcpy()/memset() alternatives mechanism x86-64: Modify copy_user_generic() alternatives mechanism x86: Lift restriction on the location of FIX_BTMAP_* x86, core: Optimize hweight32()	2010-02-28 10:35:09 -08:00
Linus Torvalds	f91b22c35f	Merge branches 'core-ipi-for-linus', 'core-locking-for-linus', 'tracing-fixes-for-linus', 'x86-debug-for-linus', 'x86-doc-for-linus', 'x86-gpu-for-linus' and 'x86-rlimit-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: plist: Fix grammar mistake, and c-style mistake * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: kprobes: Add mcount to the kprobes blacklist * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86_64: Print modules like i386 does * 'x86-doc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Put 'nopat' in kernel-parameters * 'x86-gpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-64: Allow fbdev primary video code * 'x86-rlimit-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Use helpers for rlimits	2010-02-28 10:04:02 -08:00
Pekka Enberg	c1fd1b4383	x86, mm: Unify kernel_physical_mapping_init() API This patch changes the 32-bit version of kernel_physical_mapping_init() to return the last mapped address like the 64-bit one so that we can unify the call-site in init_memory_mapping(). Cc: Yinghai Lu <yinghai@kernel.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <alpine.DEB.2.00.1002241703570.1180@melkki.cs.helsinki.fi> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-25 15:15:21 -08:00
Ian Campbell	1431559200	x86, mm: Allow highmem user page tables to be disabled at boot time Distros generally (I looked at Debian, RHEL5 and SLES11) seem to enable CONFIG_HIGHPTE for any x86 configuration which has highmem enabled. This means that the overhead applies even to machines which have a fairly modest amount of high memory and which therefore do not really benefit from allocating PTEs in high memory but still pay the price of the additional mapping operations. Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but no actual highptes being allocated there was a reduction in system time used from 59.737s to 55.9s. With CONFIG_HIGHPTE=y and highmem PTEs being allocated: Average Optimal load -j 4 Run (std deviation): Elapsed Time 175.396 (0.238914) User Time 515.983 (5.85019) System Time 59.737 (1.26727) Percent CPU 263.8 (71.6796) Context Switches 39989.7 (4672.64) Sleeps 42617.7 (246.307) With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated: Average Optimal load -j 4 Run (std deviation): Elapsed Time 174.278 (0.831968) User Time 515.659 (6.07012) System Time 55.9 (1.07799) Percent CPU 263.8 (71.266) Context Switches 39929.6 (4485.13) Sleeps 42583.7 (373.039) This patch allows the user to control the allocation of PTEs in highmem from the command line ("userpte=nohigh") but retains the status-quo as the default. It is possible that some simple heuristic could be developed which allows auto-tuning of this option however I don't have a sufficiently large machine available to me to perform any particularly meaningful experiments. We could probably handwave up an argument for a threshold at 16G of total RAM. Assuming 768M of lowmem we have 196608 potential lowmem PTE pages. Each page can map 2M of RAM in a PAE-enabled configuration, meaning a maximum of 384G of RAM could potentially be mapped using lowmem PTEs. Even allowing generous factor of 10 to account for other required lowmem allocations, generous slop to account for page sharing (which reduces the total amount of RAM mappable by a given number of PT pages) and other innacuracies in the estimations it would seem that even a 32G machine would not have a particularly pressing need for highmem PTEs. I think 32G could be considered to be at the upper bound of what might be sensible on a 32 bit machine (although I think in practice 64G is still supported). It's seems questionable if HIGHPTE is even a win for any amount of RAM you would sensibly run a 32 bit kernel on rather than going 64 bit. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-25 10:28:19 +01:00
Suresh Siddha	281ff33b7c	x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already We currently enforce the !RW mapping for the kernel mapping that maps holes between different text, rodata and data sections. However, kernel identity mappings will have different RWX permissions to the pages mapping to text and to the pages padding (which are freed) the text, rodata sections. Hence kernel identity mappings will be broken to smaller pages. For 64-bit, kernel text and kernel identity mappings are different, so we can enable protection checks that come with CONFIG_DEBUG_RODATA, as well as retain 2MB large page mappings for kernel text. Konrad reported a boot failure with the Linux Xen paravirt guest because of this. In this paravirt guest case, the kernel text mapping and the kernel identity mapping share the same page-table pages. Thus forcing the !RW mapping for some of the kernel mappings also cause the kernel identity mappings to be read-only resulting in the boot failure. Linux Xen paravirt guest also uses 4k mappings and don't use 2M mapping. Fix this issue and retain large page performance advantage for native kernels by not working hard and not enforcing !RW for the kernel text mapping, if the current mapping is already using small page mapping. Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1266522700.2909.34.camel@sbs-t61.sc.intel.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@kernel.org [2.6.32, 2.6.33] Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-22 15:09:31 -08:00
Pallipadi, Venkatesh	9e41a49aab	x86, pat: Migrate to rbtree only backend for pat memtype management Move pat backend to fully rbtree based implementation from the existing rbtree and linked list hybrid. New rbtree based solution uses interval trees (augmented rbtrees) in order to store the PAT ranges. The new code seprates out the pat backend to pat_rbtree.c file, making is cleaner. The change also makes the PAT lookup, reserve and free operations more optimal, as we don't have to traverse linear linked list of few tens of entries in normal case. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100210232607.GB11465@linux-os.sc.intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-18 15:41:36 -08:00
venkatesh.pallipadi@intel.com	be5a0c126a	x86, pat: Preparatory changes in pat.c for bigger rbtree change Minor changes in pat.c to cleanup code and make it smoother to introduce bigger rbtree only change in the following patch. The changes are cleaup only and should not have any functional impact. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20100210195909.792781000@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-18 15:41:21 -08:00
Catalin Marinas	81fc03909a	kmemcheck: Test the full object in kmemcheck_is_obj_initialized() This is a fix for bug #14845 (bugzilla.kernel.org). The update_checksum() function in mm/kmemleak.c calls kmemcheck_is_obj_initialised() before scanning an object. When KMEMCHECK_PARTIAL_OK is enabled, this function returns true. However, the crc32_le() reads smaller intervals (32-bit) for which kmemleak_is_obj_initialised() may be false leading to a kmemcheck warning. Note that kmemcheck_is_obj_initialized() is currently only used by kmemleak before scanning a memory location. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christian Casteyde <casteyde.christian@free.fr> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2010-02-17 21:39:08 +02:00
Thomas Gleixner	39c662f60c	x86: Convert tlbstate_lock to raw_spinlock Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-02-17 18:28:59 +01:00
Thomas Gleixner	b7e56edba4	Merge branch 'linus' into x86/mm x86/mm is on 32-rc4 and missing the spinlock namespace changes which are needed for further commits into this topic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2010-02-17 18:28:05 +01:00
David Rientjes	ca2107c9d6	x86, numa: Remove configurable node size support for numa emulation Now that numa=fake=<size>[MG] is implemented, it is possible to remove configurable node size support. The command-line parsing was already broken (numa=fake=*128, for example, would not work) and since fake nodes are now interleaved over physical nodes, this support is no longer required. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1002151343080.26927@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-15 14:34:18 -08:00
David Rientjes	8df5bb34de	x86, numa: Add fixed node size option for numa emulation numa=fake=N specifies the number of fake nodes, N, to partition the system into and then allocates them by interleaving over physical nodes. This requires knowledge of the system capacity when attempting to allocate nodes of a certain size: either very large nodes to benchmark scalability of code that operates on individual nodes, or very small nodes to find bugs in the VM. This patch introduces numa=fake=<size>[MG] so it is possible to specify the size of each node to allocate. When used, nodes of the size specified will be allocated and interleaved over the set of physical nodes. FAKE_NODE_MIN_SIZE was also moved to the more-appropriate include/asm/numa_64.h. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-15 14:34:10 -08:00
David Rientjes	68fd111e02	x86, numa: Fix numa emulation calculation of big nodes numa=fake=N uses split_nodes_interleave() to partition the system into N fake nodes. Each node size must have be a multiple of FAKE_NODE_MIN_SIZE, otherwise it is possible to get strange alignments. Because of this, the remaining memory from each node when rounded to FAKE_NODE_MIN_SIZE is consolidated into a number of "big nodes" that are bigger than the rest. The calculation of the number of big nodes is incorrect since it is using a logical AND operator when it should be multiplying the rounded-off portion of each node with N. Signed-off-by: David Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1002151342230.26927@chino.kir.corp.google.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-15 14:34:04 -08:00
Yinghai Lu	59be5a8e8c	x86: Make 32bit support NO_BOOTMEM Let's make 32bit consistent with 64bit. -v2: Andrew pointed out for 32bit that we should use -1ULL Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-25-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-12 09:42:39 -08:00
Yinghai Lu	9bdac91424	sparsemem: Put mem map for one node together. Add vmemmap_alloc_block_buf for mem map only. It will fallback to the old way if it cannot get a block that big. Before this patch, when a node have 128g ram installed, memmap are split into two parts or more. [ 0.000000] [ffffea0000000000-ffffea003fffffff] PMD -> [ffff880100600000-ffff88013e9fffff] on node 1 [ 0.000000] [ffffea0040000000-ffffea006fffffff] PMD -> [ffff88013ec00000-ffff88016ebfffff] on node 1 [ 0.000000] [ffffea0070000000-ffffea007fffffff] PMD -> [ffff882000600000-ffff8820105fffff] on node 0 [ 0.000000] [ffffea0080000000-ffffea00bfffffff] PMD -> [ffff882010800000-ffff8820507fffff] on node 0 [ 0.000000] [ffffea00c0000000-ffffea00dfffffff] PMD -> [ffff882050a00000-ffff8820709fffff] on node 0 [ 0.000000] [ffffea00e0000000-ffffea00ffffffff] PMD -> [ffff884000600000-ffff8840205fffff] on node 2 [ 0.000000] [ffffea0100000000-ffffea013fffffff] PMD -> [ffff884020800000-ffff8840607fffff] on node 2 [ 0.000000] [ffffea0140000000-ffffea014fffffff] PMD -> [ffff884060a00000-ffff8840709fffff] on node 2 [ 0.000000] [ffffea0150000000-ffffea017fffffff] PMD -> [ffff886000600000-ffff8860305fffff] on node 3 [ 0.000000] [ffffea0180000000-ffffea01bfffffff] PMD -> [ffff886030800000-ffff8860707fffff] on node 3 [ 0.000000] [ffffea01c0000000-ffffea01ffffffff] PMD -> [ffff888000600000-ffff8880405fffff] on node 4 [ 0.000000] [ffffea0200000000-ffffea022fffffff] PMD -> [ffff888040800000-ffff8880707fffff] on node 4 [ 0.000000] [ffffea0230000000-ffffea023fffffff] PMD -> [ffff88a000600000-ffff88a0105fffff] on node 5 [ 0.000000] [ffffea0240000000-ffffea027fffffff] PMD -> [ffff88a010800000-ffff88a0507fffff] on node 5 [ 0.000000] [ffffea0280000000-ffffea029fffffff] PMD -> [ffff88a050a00000-ffff88a0709fffff] on node 5 [ 0.000000] [ffffea02a0000000-ffffea02bfffffff] PMD -> [ffff88c000600000-ffff88c0205fffff] on node 6 [ 0.000000] [ffffea02c0000000-ffffea02ffffffff] PMD -> [ffff88c020800000-ffff88c0607fffff] on node 6 [ 0.000000] [ffffea0300000000-ffffea030fffffff] PMD -> [ffff88c060a00000-ffff88c0709fffff] on node 6 [ 0.000000] [ffffea0310000000-ffffea033fffffff] PMD -> [ffff88e000600000-ffff88e0305fffff] on node 7 [ 0.000000] [ffffea0340000000-ffffea037fffffff] PMD -> [ffff88e030800000-ffff88e0707fffff] on node 7 after patch will get [ 0.000000] [ffffea0000000000-ffffea006fffffff] PMD -> [ffff880100200000-ffff88016e5fffff] on node 0 [ 0.000000] [ffffea0070000000-ffffea00dfffffff] PMD -> [ffff882000200000-ffff8820701fffff] on node 1 [ 0.000000] [ffffea00e0000000-ffffea014fffffff] PMD -> [ffff884000200000-ffff8840701fffff] on node 2 [ 0.000000] [ffffea0150000000-ffffea01bfffffff] PMD -> [ffff886000200000-ffff8860701fffff] on node 3 [ 0.000000] [ffffea01c0000000-ffffea022fffffff] PMD -> [ffff888000200000-ffff8880701fffff] on node 4 [ 0.000000] [ffffea0230000000-ffffea029fffffff] PMD -> [ffff88a000200000-ffff88a0701fffff] on node 5 [ 0.000000] [ffffea02a0000000-ffffea030fffffff] PMD -> [ffff88c000200000-ffff88c0701fffff] on node 6 [ 0.000000] [ffffea0310000000-ffffea037fffffff] PMD -> [ffff88e000200000-ffff88e0701fffff] on node 7 -v2: change buf to vmemmap_buf instead according to Ingo also add CONFIG_SPARSEMEM_ALLOC_MEM_MAP_TOGETHER according to Ingo -v3: according to Andrew, use sizeof(name) instead of hard coded 15 Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-19-git-send-email-yinghai@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-12 09:42:38 -08:00
Yinghai Lu	08677214e3	x86: Make 64 bit use early_res instead of bootmem before slab Finally we can use early_res to replace bootmem for x86_64 now. Still can use CONFIG_NO_BOOTMEM to enable it or not. -v2: fix 32bit compiling about MAX_DMA32_PFN -v3: folded bug fix from LKML message below Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B747239.4070907@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-12 09:41:59 -08:00
Linus Torvalds	5ea8d37592	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, apic: Don't use logical-flat mode when CPU hotplug may exceed 8 CPUs x86-32: Make AT_VECTOR_SIZE_ARCH=2 x86/agp: Fix amd64-agp module initialization regression x86, doc: Fix minor spelling error in arch/x86/mm/gup.c	2010-02-11 14:01:10 -08:00
Yinghai Lu	cef625eef8	x86: Make early_node_mem get mem > 4 GB if possible So we could put pgdata for the node high, and later sparse vmmap will get the section nr that need. With this patch will make <4 GB ram not use a sparse vmmap. before this patch, will get, before swiotlb try get bootmem [ 0.000000] nid=1 start=0 end=2080000 aligned=1 [ 0.000000] free [10 - 96] [ 0.000000] free [b12 - 1000] [ 0.000000] free [359f - 38a3] [ 0.000000] free [38b5 - 3a00] [ 0.000000] free [41e01 - 42000] [ 0.000000] free [73dde - 73e00] [ 0.000000] free [73fdd - 74000] [ 0.000000] free [741dd - 74200] [ 0.000000] free [743dd - 74400] [ 0.000000] free [745dd - 74600] [ 0.000000] free [747dd - 74800] [ 0.000000] free [749dd - 74a00] [ 0.000000] free [74bdd - 74c00] [ 0.000000] free [74ddd - 74e00] [ 0.000000] free [74fdd - 75000] [ 0.000000] free [751dd - 75200] [ 0.000000] free [753dd - 75400] [ 0.000000] free [755dd - 75600] [ 0.000000] free [757dd - 75800] [ 0.000000] free [759dd - 75a00] [ 0.000000] free [75bdd - 7bf5f] [ 0.000000] free [7f730 - 7f750] [ 0.000000] free [100000 - 2080000] [ 0.000000] total free 1f87170 [ 93.301474] Placing 64MB software IO TLB between ffff880075bdd000 - ffff880079bdd000 [ 93.311814] software IO TLB at phys 0x75bdd000 - 0x79bdd000 with this patch will get: before swiotlb try get bootmem [ 0.000000] nid=1 start=0 end=2080000 aligned=1 [ 0.000000] free [a - 96] [ 0.000000] free [702 - 1000] [ 0.000000] free [359f - 3600] [ 0.000000] free [37de - 3800] [ 0.000000] free [39dd - 3a00] [ 0.000000] free [3bdd - 3c00] [ 0.000000] free [3ddd - 3e00] [ 0.000000] free [3fdd - 4000] [ 0.000000] free [41dd - 4200] [ 0.000000] free [43dd - 4400] [ 0.000000] free [45dd - 4600] [ 0.000000] free [47dd - 4800] [ 0.000000] free [49dd - 4a00] [ 0.000000] free [4bdd - 4c00] [ 0.000000] free [4ddd - 4e00] [ 0.000000] free [4fdd - 5000] [ 0.000000] free [51dd - 5200] [ 0.000000] free [53dd - 5400] [ 0.000000] free [55dd - 7bf5f] [ 0.000000] free [7f730 - 7f750] [ 0.000000] free [100428 - 100600] [ 0.000000] free [13ea01 - 13ec00] [ 0.000000] free [170800 - 2080000] [ 0.000000] total free 1f87170 [ 92.689485] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 92.699799] Placing 64MB software IO TLB between ffff8800055dd000 - ffff8800095dd000 [ 92.710916] software IO TLB at phys 0x55dd000 - 0x95dd000 so will get enough space below 4G, aka pfn 0x100000 Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-15-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-10 17:47:18 -08:00
Yinghai Lu	1842f90cc9	x86: Call early_res_to_bootmem one time Simplify setup_node_mem: don't use bootmem from other node, instead just find_e820_area in early_node_mem. This keeps the boundary between early_res and boot mem more clear, and lets us only call early_res_to_bootmem() one time instead of for all nodes. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-12-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-10 17:47:18 -08:00
H. Peter Anvin	84abd88a70	Merge remote branch 'linus/master' into x86/bootmem	2010-02-10 16:55:28 -08:00
Shaohui Zheng	ea0854170c	memory hotplug: fix a bug on /dev/mem for 64-bit kernels Newly added memory can not be accessed via /dev/mem, because we do not update the variables high_memory, max_pfn and max_low_pfn. Add a function update_end_of_memory_vars() to update these variables for 64-bit kernels. [akpm@linux-foundation.org: simplify comment] Signed-off-by: Shaohui Zheng <shaohui.zheng@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Li Haicheng <haicheng.li@intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-02-02 18:11:23 -08:00
Andy Shevchenko	ab09809f2e	x86, doc: Fix minor spelling error in arch/x86/mm/gup.c Fix minor spelling error in comment. No code change. Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com> LKML-Reference: <201002022238.o12McDiF018720@imap1.linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-02 16:00:44 -08:00
Wu Fengguang	13ca0fcaa3	x86: Use the generic page_is_ram() The generic resource based page_is_ram() works better with memory hotplug/hotremove. So switch the x86 e820map based code to it. CC: Andi Kleen <andi@firstfloor.org> CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> LKML-Reference: <20100122033004.470767217@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-01 16:58:17 -08:00
Yinghai Lu	1b5576e69a	x86: Remove BIOS data range from e820 In preparation for moving to the generic page_is_ram(), make explicit what we expect to be reserved and not reserved. Tested-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20100122033004.335813103@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-02-01 16:58:17 -08:00
Jiri Slaby	2854e72b58	x86: Use helpers for rlimits Make sure compiler won't do weird things with limits. Fetching them twice may return 2 different values after writable limits are implemented. We can either use rlimit helpers added in `3e10e716ab` or ACCESS_ONCE if not applicable; this patch uses the helpers. Signed-off-by: Jiri Slaby <jslaby@suse.cz> LKML-Reference: <1264609942-24621-1-git-send-email-jslaby@suse.cz> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-01-27 15:17:31 -08:00
David Rientjes	3a5fc0e40c	x86: Set hotpluggable nodes in nodes_possible_map nodes_possible_map does not currently include nodes that have SRAT entries that are all ACPI_SRAT_MEM_HOT_PLUGGABLE since the bit is cleared in nodes_parsed if it does not have an online address range. Unequivocally setting the bit in nodes_parsed is insufficient since existing code, such as acpi_get_nodes(), assumes all nodes in the map have online address ranges. In fact, all code using nodes_parsed assumes such nodes represent an address range of online memory. nodes_possible_map is created by unioning nodes_parsed and cpu_nodes_parsed; the former represents nodes with online memory and the latter represents memoryless nodes. We now set the bit for hotpluggable nodes in cpu_nodes_parsed so that it also gets set in nodes_possible_map. [ hpa: Haicheng Li points out that this makes the naming of the variable cpu_nodes_parsed somewhat counterintuitive. However, leave it as is in the interest of keeping the pure bug fix patch small. ] Signed-off-by: David Rientjes <rientjes@google.com> Tested-by: Haicheng Li <haicheng.li@linux.intel.com> LKML-Reference: <alpine.DEB.2.00.1001201152040.30528@chino.kir.corp.google.com> Cc: <stable@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2010-01-23 06:21:57 +01:00
Luca Barbieri	0bb7a95f54	hw-breakpoints, perf: Fix broken mmiotrace due to dr6 by reference change Commit `62edab9056` (from June 2009 but merged in 2.6.33) changes notify_die to pass dr6 by reference. However, it forgets to fix the check for DR_STEP in kmmio.c, breaking mmiotrace. It also passes a wrong value to the post handler. This simple fix makes mmiotrace work again. Signed-off-by: Luca Barbieri <luca@luca-barbieri.com> Acked-by: K.Prasad <prasad@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1263634770-14578-1-git-send-email-luca@luca-barbieri.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-01-17 08:01:44 +01:00
Andreas Fenkart	4b529401c5	mm: make totalhigh_pages unsigned long Makes it consistent with the extern declaration, used when CONFIG_HIGHMEM is set Removes redundant casts in printout messages Signed-off-by: Andreas Fenkart <andreas.fenkart@streamunlimited.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Lennox Wu <lennox.wu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-01-11 09:34:03 -08:00
Jan Beulich	499a5f1efa	x86: Lift restriction on the location of FIX_BTMAP_* The early ioremap fixmap entries cover half (or for 32-bit non-PAE, a quarter) of a page table, yet they got uncondtitionally aligned so far to a 256-entry boundary. This is not necessary if the range of page table entries anyway falls into a single page table. This buys back, for (theoretically) 50% of all configurations (25% of all non-PAE ones), at least some of the lowmem necessarily lost with commit `e621bd1895`. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B2BB66F0200007800026AD6@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-30 11:57:30 +01:00
Pekka Enberg	c0ca9da442	x86, kmemcheck: Use KERN_WARNING for error reporting As suggested by Vegard Nossum, use KERN_WARNING for error reporting to make sure kmemcheck reports end up in syslog. Suggested-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1261990935.4641.7.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-28 10:28:35 +01:00
Ingo Molnar	605c1a187f	Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent	2009-12-28 09:23:13 +01:00
Linus Torvalds	3981e15286	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system Makefile: Unexport LC_ALL instead of clearing it x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk x86: Reenable TSC sync check at boot, even with NONSTOP_TSC x86: Don't use POSIX character classes in gen-insn-attr-x86.awk Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA x86: Fix checking of SRAT when node 0 ram is not from 0 x86, cpuid: Add "volatile" to asm in native_cpuid() x86, msr: msrs_alloc/free for CONFIG_SMP=n x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space x86: Add IA32_TSC_AUX MSR and use it x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers initramfs: add missing decompressor error check bzip2: Add missing checks for malloc returning NULL bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure	2009-12-19 09:48:14 -08:00
Yinghai Lu	3299625036	x86: Fix checking of SRAT when node 0 ram is not from 0 Found one system that boot from socket1 instead of socket0, SRAT get rejected... [ 0.000000] SRAT: Node 1 PXM 0 0-a0000 [ 0.000000] SRAT: Node 1 PXM 0 100000-80000000 [ 0.000000] SRAT: Node 1 PXM 0 100000000-2080000000 [ 0.000000] SRAT: Node 0 PXM 1 2080000000-4080000000 [ 0.000000] SRAT: Node 2 PXM 2 4080000000-6080000000 [ 0.000000] SRAT: Node 3 PXM 3 6080000000-8080000000 [ 0.000000] SRAT: Node 4 PXM 4 8080000000-a080000000 [ 0.000000] SRAT: Node 5 PXM 5 a080000000-c080000000 [ 0.000000] SRAT: Node 6 PXM 6 c080000000-e080000000 [ 0.000000] SRAT: Node 7 PXM 7 e080000000-10080000000 ... [ 0.000000] NUMA: Allocated memnodemap from 500000 - 701040 [ 0.000000] NUMA: Using 20 for the hash shift. [ 0.000000] Adding active range (0, 0x2080000, 0x4080000) 0 entries of 3200 used [ 0.000000] Adding active range (1, 0x0, 0x96) 1 entries of 3200 used [ 0.000000] Adding active range (1, 0x100, 0x7f750) 2 entries of 3200 used [ 0.000000] Adding active range (1, 0x100000, 0x2080000) 3 entries of 3200 used [ 0.000000] Adding active range (2, 0x4080000, 0x6080000) 4 entries of 3200 used [ 0.000000] Adding active range (3, 0x6080000, 0x8080000) 5 entries of 3200 used [ 0.000000] Adding active range (4, 0x8080000, 0xa080000) 6 entries of 3200 used [ 0.000000] Adding active range (5, 0xa080000, 0xc080000) 7 entries of 3200 used [ 0.000000] Adding active range (6, 0xc080000, 0xe080000) 8 entries of 3200 used [ 0.000000] Adding active range (7, 0xe080000, 0x10080000) 9 entries of 3200 used [ 0.000000] SRAT: PXMs only cover 917504MB of your 1048566MB e820 RAM. Not used. [ 0.000000] SRAT: SRAT not used. the early_node_map is not sorted because node0 with non zero start come first. so try to sort it right away after all regions are registered. also fixs refression by `8716273c` (x86: Export srat physical topology) -v2: make it more solid to handle cross node case like node0 [0,4g), [8,12g) and node1 [4g, 8g), [12g, 16g) -v3: update comments. Reported-and-tested-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B2579D2.3010201@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-12-16 16:43:37 -08:00
Linus Torvalds	75b08038ce	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, mce: Clean up thermal init by introducing intel_thermal_supported() x86, mce: Thermal monitoring depends on APIC being enabled x86: Gart: fix breakage due to IOMMU initialization cleanup x86: Move swiotlb initialization before dma32_free_bootmem x86: Fix build warning in arch/x86/mm/mmio-mod.c x86: Remove usedac in feature-removal-schedule.txt x86: Fix duplicated UV BAU interrupt vector nvram: Fix write beyond end condition; prove to gcc copy is safe mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe x86: Limit the number of processor bootup messages x86: Remove enabling x2apic message for every CPU doc: Add documentation for bootloader_{type,version} x86, msr: Add support for non-contiguous cpumasks x86: Use find_e820() instead of hard coded trampoline address x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c	2009-12-14 12:36:46 -08:00
Joe Perches	eba11d6da7	x86: Fix build warning in arch/x86/mm/mmio-mod.c Stephen Rothwell reported these warnings: arch/x86/mm/mmio-mod.c: In function 'print_pte': arch/x86/mm/mmio-mod.c💯 warning: too many arguments for format arch/x86/mm/mmio-mod.c:106: warning: too many arguments for format The 'fmt' was left out accidentally. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Joe Perches <joe@perches.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus <torvalds@linux-foundation.org> LKML-Reference: <1260775443.18538.16.camel@Joe-Laptop.home> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-14 08:55:43 +01:00
Linus Torvalds	756300983f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Fix PCI hotplug with passthrough mode x86/amd-iommu: Fix passthrough mode x86: mmio-mod.c: Use pr_fmt x86: kmmio.c: Add and use pr_fmt(fmt) x86: i8254.c: Add pr_fmt(fmt) x86: setup_percpu.c: Use pr_<level> and add pr_fmt(fmt) x86: es7000_32.c: Use pr_<level> and add pr_fmt(fmt) x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs() x86: Factor duplicated code out of __show_regs() into show_regs_common() arch/x86/kernel/microcode*: Use pr_fmt() and remove duplicated KERN_ERR prefix x86, mce: fix confusion between bank attributes and mce attributes x86/mce: Set up timer unconditionally x86: Fix bogus warning in apic_noop.apic_write() x86: Fix typo in arch/x86/mm/kmmio.c x86: ASUS P4S800 reboot=bios quirk	2009-12-11 20:47:59 -08:00
H. Peter Anvin	5c6baba84e	Merge commit 'linus/master' into x86/urgent	2009-12-11 10:57:42 -08:00
Christoph Hellwig	6b2f3d1f76	vfs: Implement proper O_SYNC semantics While Linux provided an O_SYNC flag basically since day 1, it took until Linux 2.4.0-test12pre2 to actually get it implemented for filesystems, since that day we had generic_osync_around with only minor changes and the great "For now, when the user asks for O_SYNC, we'll actually give O_DSYNC" comment. This patch intends to actually give us real O_SYNC semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC patches which are required before this patch it's actually surprisingly simple, we just need to figure out when to set the datasync flag to vfs_fsync_range and when not. This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's numerical value to keep binary compatibility, and adds a new real O_SYNC flag. To guarantee backwards compatiblity it is defined as expanding to both the O_DSYNC and the new additional binary flag (__O_SYNC) to make sure we are backwards-compatible when compiled against the new headers. This also means that all places that don't care about the differences can just check O_DSYNC and get the right behaviour for O_SYNC, too - only places that actuall care need to check __O_SYNC in addition. Drivers and network filesystems have been updated in a fail safe way to always do the full sync magic if O_DSYNC is set. The few places setting O_SYNC for lower layers are kept that way for now to stay failsafe. We enforce that O_DSYNC is set when __O_SYNC is set early in the open path to make sure we always get these sane options. Note that parisc really screwed up their headers as they already define a O_DSYNC that has always been a no-op. We try to repair it by using it for the new O_DSYNC and redefinining O_SYNC to send both the traditional O_SYNC numerical value _and_ the O_DSYNC one. Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Grant Grundler <grundler@parisc-linux.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@sun.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jan Kara <jack@suse.cz>	2009-12-10 15:02:50 +01:00
Joe Perches	3a0340be06	x86: mmio-mod.c: Use pr_fmt - Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Remove #define NAME - Remove NAME from pr_<level> Signed-off-by: Joe Perches <joe@perches.com> LKML-Reference: <009cb214c45ef932df0242856228f4739cc91408.1260383912.git.joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-10 08:57:51 +01:00
Joe Perches	1bd591a5f1	x86: kmmio.c: Add and use pr_fmt(fmt) - Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt - Strip "kmmio: " from pr_<level>s Signed-off-by: Joe Perches <joe@perches.com> LKML-Reference: <7aa509f8a23933036d39f54bd51e9acc52068049.1260383912.git.joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-10 08:57:51 +01:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
Ingo Molnar	4c68db38c8	Merge branch 'linus' into x86/urgent Merge reason: We want to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-09 08:24:57 +01:00
Linus Torvalds	b391738bd1	Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: pat: Remove ioremap_default() x86: pat: Clean up req_type special case for reserve_memtype() x86: Relegate CONFIG_PAT and CONFIG_MTRR configurability to EMBEDDED	2009-12-08 13:34:17 -08:00
Linus Torvalds	e33c019722	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits) x86, mm: Correct the implementation of is_untracked_pat_range() x86/pat: Trivial: don't create debugfs for memtype if pat is disabled x86, mtrr: Fix sorting of mtrr after subtracting x86: Move find_smp_config() earlier and avoid bootmem usage x86, platform: Change is_untracked_pat_range() to bool; cleanup init x86: Change is_ISA_range() into an inline function x86, mm: is_untracked_pat_range() takes a normal semiclosed range x86, mm: Call is_untracked_pat_range() rather than is_ISA_range() x86: UV SGI: Don't track GRU space in PAT x86: SGI UV: Fix BAU initialization x86, numa: Use near(er) online node instead of roundrobin for NUMA x86, numa, bootmem: Only free bootmem on NUMA failure path x86: Change crash kernel to reserve via reserve_early() x86: Eliminate redundant/contradicting cache line size config options x86: When cleaning MTRRs, do not fold WP into UC x86: remove "extern" from function prototypes in <asm/proto.h> x86, mm: Report state of NX protections during boot x86, mm: Clean up and simplify NX enablement x86, pageattr: Make set_memory_(x\|nx) aware of NX support x86, sleep: Always save the value of EFER ... Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range) to 'struct x86_platform_ops') in arch/x86/include/asm/x86_init.h arch/x86/kernel/x86_init.c	2009-12-08 13:27:33 -08:00
Jiri Kosina	d014d04386	Merge branch 'for-next' into for-linus Conflicts: kernel/irq/chip.c	2009-12-07 18:36:35 +01:00
Ingo Molnar	f3d607c6b3	Merge branch 'linus' into x86/urgent Merge reason: we want to queue up a dependent fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-07 13:14:18 +01:00
Shaun Patterson	8055039c2a	x86: Fix typo in arch/x86/mm/kmmio.c Signed-off-by: Shaun Patterson <shaunpatterson@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Cc: pq@iki.fi LKML-Reference: <1260027694.10074.170.camel@linux-4lgc.site> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-12-06 09:23:40 +01:00
Linus Torvalds	6ec22f9b03	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Limit number of per cpu TSC sync messages x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes x86: Suppress stack overrun message for init_task x86: Fix cpu_devs[] initialization in early_cpu_init() x86: Remove CPU cache size output for non-Intel too x86: Minimise printk spew from per-vendor init code x86: Remove the CPU cache size printk's cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c x86: Make sure we also print a Code: line for show_regs()	2009-12-05 15:33:27 -08:00
Linus Torvalds	ef26b1691d	Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h x86/alternatives: Check replacementlen <= instrlen at build time x86, 64-bit: Set data segments to null after switching to 64-bit mode x86: Clean up the loadsegment() macro x86: Optimize loadsegment() x86: Add missing might_fault() checks to copy_{to,from}_user() x86-64: __copy_from_user_inatomic() adjustments x86: Remove unused thread_return label from switch_to() x86, 64-bit: Fix bstep_iret jump x86: Don't use the strict copy checks when branch profiling is in use x86, 64-bit: Move K8 B step iret fixup to fault entry asm x86: Generate cmpxchg build failures x86: Add a Kconfig option to turn the copy_from_user warnings into errors x86: Turn the copy_from_user check into an (optional) compile time warning x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()	2009-12-05 15:32:03 -08:00
Linus Torvalds	a77d2e081b	Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits) x86, apic: Enable lapic nmi watchdog on AMD Family 11h x86: Remove unnecessary mdelay() from cpu_disable_common() x86, ioapic: Document another case when level irq is seen as an edge x86, ioapic: Fix the EOI register detection mechanism x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq x86: SGI UV: Map low MMR ranges x86: apic: Print out SRAT table APIC id in hex x86: Re-get cfg_new in case reuse/move irq_desc x86: apic: Remove not needed #ifdef x86: io-apic: IO-APIC MMIO should not fail on resource insertion x86: Remove asm/apicnum.h x86: apic: Do not use stacked physid_mask_t x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit x86, ioapic: Use snrpintf while set names for IO-APIC resourses x86, apic: Use PAGE_SIZE instead of numbers x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs() x86: Use EOI register in io-apic on intel platforms x86: Force irq complete move during cpu offline x86: Remove move_cleanup_count from irq_cfg x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping ...	2009-12-05 15:31:25 -08:00
André Goddard Rosa	af901ca181	tree-wide: fix assorted typos all over the place That is "success", "unknown", "through", "performance", "[re\|un]mapping" , "access", "default", "reasonable", "[con]currently", "temperature" , "channel", "[un]used", "application", "example","hierarchy", "therefore" , "[over\|under]flow", "contiguous", "threshold", "enough" and others. Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-12-04 15:39:55 +01:00
Xiaotian Feng	dd4377b02d	x86/pat: Trivial: don't create debugfs for memtype if pat is disabled If pat is disabled (boot with nopat), there's no need to create debugfs for it, it's empty all the time. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1259236428-16329-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-26 15:05:03 +01:00
Yinghai Lu	b24c2a925a	x86: Move find_smp_config() earlier and avoid bootmem usage Move the find_smp_config() call to before bootmem is initialized. Use reserve_early() instead of reserve_bootmem() in it. This simplifies the code, we only need to call find_smp_config() once and can remove the now unneeded reserve parameter from x86_init_mpparse::find_smp_config. We thus also reduce x86's dependency on bootmem allocations. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B0BB9F2.70907@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-24 12:10:51 +01:00
H. Peter Anvin	eb41c8be89	x86, platform: Change is_untracked_pat_range() to bool; cleanup init - Change is_untracked_pat_range() to return bool. - Clean up the initialization of is_untracked_pat_range() -- by default, we simply point it at is_ISA_range() directly. - Move is_untracked_pat_range to the end of struct x86_platform, since it is the newest field. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091119202341.GA4420@sgi.com>	2009-11-23 17:09:59 -08:00
H. Peter Anvin	8a27138924	x86, mm: is_untracked_pat_range() takes a normal semiclosed range is_untracked_pat_range() -- like its components, is_ISA_range() and is_GRU_range(), takes a normal semiclosed interval (>=, <) whereas the PAT code called it as if it took a closed range (>=, <=). Fix. Although this is a bug, I believe it is non-manifest, simply because none of the callers will call this with non-page-aligned addresses. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091119202341.GA4420@sgi.com>	2009-11-23 17:09:59 -08:00
Jack Steiner	fd12a0d69a	x86: UV SGI: Don't track GRU space in PAT GRU space is always mapped as WB in the page table. There is no need to track the mappings in the PAT. This also eliminates the "freeing invalid memtype" messages when the GRU space is unmapped. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20091119202341.GA4420@sgi.com> [ v2: fix build failure ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 19:47:42 +01:00
Jan Beulich	0e7810be30	x86: Suppress stack overrun message for init_task init_task doesn't get its stack end location set to STACK_END_MAGIC, and hence the message is confusing rather than helpful in this case. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4B06AEFE02000078000211F4@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 11:45:34 +01:00
Yinghai Lu	d9c2d5ac6a	x86, numa: Use near(er) online node instead of roundrobin for NUMA CPU to node mapping is set via the following sequence: 1. numa_init_array(): Set up roundrobin from cpu to online node 2. init_cpu_to_node(): Set that according to apicid_to_node[] according to srat only handle the node that is online, and leave other cpu on node without ram (aka not online) to still roundrobin. 3. later call srat_detect_node for Intel/AMD, will use first_online node or nearby node. Problem is that setup_per_cpu_areas() is not called between 2 and 3, the per_cpu for cpu on node with ram is on different node, and could put that on node with two hops away. So try to optimize this and add find_near_online_node() and call init_cpu_to_node(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 10:06:24 +01:00
Yinghai Lu	021428ad14	x86, numa, bootmem: Only free bootmem on NUMA failure path In the NUMA bootmem setup failure path we freed nodedata_phys incorrectly. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: David Rientjes <rientjes@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 10:00:48 +01:00
Yinghai Lu	163d3866cf	x86: apic: Print out SRAT table APIC id in hex Make it consistent with APIC MADT print out, for big systems APIC id in hex is more readable. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-23 09:56:30 +01:00
Ingo Molnar	96200591a3	Merge branch 'tracing/hw-breakpoints' into perf/core Conflicts: arch/x86/kernel/kprobes.c kernel/trace/Makefile Merge reason: hw-breakpoints perf integration is looking good in testing and in reviews, plus conflicts are mounting up - so merge & resolve. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-21 14:07:23 +01:00
Jan Beulich	350f8f5631	x86: Eliminate redundant/contradicting cache line size config options Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT (with inconsistent defaults), just having the latter suffices as the former can be easily calculated from it. To be consistent, also change X86_INTERNODE_CACHE_BYTES to X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA to account for last level cache line size (which here matters more than L1 cache line size). Finally, make sure the default value for X86_L1_CACHE_SHIFT, when X86_GENERIC is selected, is being seen before that for the individual CPU model options (other than on x86-64, where GENERIC_CPU is part of the choice construct, X86_GENERIC is a separate option on ix86). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Ravikiran Thirumalai <kiran@scalex86.org> Acked-by: Nick Piggin <npiggin@suse.de> LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-19 04:58:34 +01:00
Ingo Molnar	a7b63425a4	Merge branch 'perf/core' into perf/probes Resolved merge conflict in tools/perf/Makefile Merge reason: we want to queue up a dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-17 10:17:47 +01:00
Kees Cook	4b0f3b81eb	x86, mm: Report state of NX protections during boot It is possible for x86_64 systems to lack the NX bit either due to the hardware lacking support or the BIOS having turned off the CPU capability, so NX status should be reported. Additionally, anyone booting NX-capable CPUs in 32bit mode without PAE will lack NX functionality, so this change provides feedback for that case as well. Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>	2009-11-16 13:44:59 -08:00
H. Peter Anvin	4763ed4d45	x86, mm: Clean up and simplify NX enablement The 32- and 64-bit code used very different mechanisms for enabling NX, but even the 32-bit code was enabling NX in head_32.S if it is available. Furthermore, we had a bewildering collection of tests for the available of NX. This patch: a) merges the 32-bit set_nx() and the 64-bit check_efer() function into a single x86_configure_nx() function. EFER control is left to the head code. b) eliminates the nx_enabled variable entirely. Things that need to test for NX enablement can verify __supported_pte_mask directly, and cpu_has_nx gives the supported status of NX. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Tejun Heo <tj@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegardno@ifi.uio.no> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Chris Wright <chrisw@sous-sol.org> LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>	2009-11-16 13:44:59 -08:00
H. Peter Anvin	583140afb9	x86, pageattr: Make set_memory_(x\|nx) aware of NX support Make set_memory_x/set_memory_nx directly aware of if NX is supported in the system or not, rather than requiring that every caller assesses that support independently. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Starling <tstarling@wikimedia.org> Cc: Hannes Eder <hannes@hanneseder.net> LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com> Acked-by: Kees Cook <kees.cook@canonical.com>	2009-11-16 13:44:58 -08:00
Ingo Molnar	39dc78b651	Merge commit 'v2.6.32-rc7' into perf/core Merge reason: pick up perf fixlets Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-15 09:50:41 +01:00
Xiaotian Feng	2fb8f4e6a8	x86: pat: Remove ioremap_default() Commit: `b6ff32d`: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() consolidated reserve_memtype() and pat_x_mtrr_type, this made ioremap_default() same as ioremap_cache(). Remove the redundant function and change the only caller to use ioremap_cache. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1257845005-7938-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 10:34:05 +01:00
Xiaotian Feng	83ea05ea69	x86: pat: Clean up req_type special case for reserve_memtype() Commit: `b6ff32d`: x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() consolidated code in pat_x_mtrr_type() and reserve_memtype(), which removed the special case (req_type is -1) for the PAT-enabled part. We should also change comments and the PAT-disabled part. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1257844987-7906-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-10 10:34:04 +01:00
Xiaotian Feng	de2a47cf2b	x86: Fix error return sequence in __ioremap_caller() kernel missed to free memtype if get_vm_area_caller failed in __ioremap_caller. This patch introduces error path to fix this and cleans up the repetitive error return sequences that contributed to the creation of the bug. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <1257389031-20429-1-git-send-email-dfeng@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-08 12:48:58 +01:00
Suresh Siddha	e7d23dde9b	x86_64, cpa: Use only text section in set_kernel_text_rw/ro set_kernel_text_rw()/set_kernel_text_ro() are marking pages starting from _text to __start_rodata as RW or RO. With CONFIG_DEBUG_RODATA, there might be free pages (associated with padding the sections to 2MB large page boundary) between text and rodata sections that are given back to page allocator. So we should use only use the start (__text) and end (__stop___ex_table) of the text section in set_kernel_text_rw()/set_kernel_text_ro(). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024821.164525222@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-02 17:17:24 +01:00
Suresh Siddha	55ca3cc174	x86_64, ftrace: Make ftrace use kernel identity mapping to modify code On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. So use the kernel identity mapping instead of the kernel text mapping to modify the kernel text. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024821.080941108@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-02 17:16:36 +01:00
Suresh Siddha	502f660466	x86, cpa: Fix kernel text RO checks in static_protection() Steven Rostedt reported that we are unconditionally making the kernel text mapping as read-only. i.e., if someone does cpa() to the kernel text area for setting/clearing any page table attribute, we unconditionally clear the read-write attribute for the kernel text mapping that is set at compile time. We should delay (to forbid the write attribute) and enforce only after the kernel has mapped the text as read-only. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <20091029024820.996634347@sbs-t61.sc.intel.com> [ marked kernel_set_to_readonly as __read_mostly ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-11-02 17:16:35 +01:00
Steven Rostedt	883242dd0e	tracing: allow to change permissions for text with dynamic ftrace enabled The commit `74e081797b` x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA prevents text sections from becoming read/write using set_memory_rw. The dynamic ftrace changes all text pages to read/write just before converting the calls to tracing to nops, and vice versa. I orginally just added a flag to allow this transaction when ftrace did the change, but I also found that when the CPA testing was running it would remove the read/write as well, and ftrace does not do the text conversion on boot up, and the CPA changes caused the dynamic tracer to fail on self tests. The current solution I have is to simply not to prevent change_page_attr from setting the RW bit for kernel text pages. Reported-by: Ingo Molnar <mingo@elte.hu> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>	2009-10-27 13:15:11 -04:00
Minchan Kim	b1258ac296	x86: Remove pfn in add_one_highpage_init() commit `cc9f7a0ccf` changed add_one_highpage_init. We don't use pfn any more. Let's remove unnecessary argument. This patch doesn't chage function behavior. This patch is based on v2.6.32-rc5. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Cc: Yinghai Lu <yhlu.kernel@gmail.com> LKML-Reference: <20091022112722.adc8e55c.minchan.kim@barrios-desktop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-23 13:39:26 +02:00
Ingo Molnar	4331595650	Merge branch 'perf/core' into perf/probes Conflicts: tools/perf/Makefile Merge reason: - fix the conflict - pick up the pr_*() infrastructure to queue up dependent patch Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-23 08:23:20 +02:00
Suresh Siddha	74e081797b	x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel text/rodata/data to small 4KB pages as they are mapped with different attributes (text as RO, RODATA as RO and NX etc). On x86_64, preserve the large page mappings for kernel text/rodata/data boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the RODATA section to be hugepage aligned and having same RWX attributes for the 2MB page boundaries Extra Memory pages padding the sections will be freed during the end of the boot and the kernel identity mappings will have different RWX permissions compared to the kernel text mappings. Kernel identity mappings to these physical pages will be mapped with smaller pages but large page mappings are still retained for kernel text,rodata,data mappings. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-10-20 14:46:00 +09:00
Suresh Siddha	b9af7c0d44	x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA In the first 2MB, kernel text is co-located with kernel static page tables setup by head_64.S. CONFIG_DEBUG_RODATA chops this 2MB large page mapping to small 4KB pages as we mark the kernel text as RO, leaving the static page tables as RW. With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement with 2% reduction in system time and 1% improvement in iowait idle time. To recover this, move the kernel static page tables to .data section, so that we don't have to break the first 2MB of kernel text to small pages with CONFIG_DEBUG_RODATA. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-10-20 14:46:00 +09:00
Frederic Weisbecker	0f8f86c7bd	Merge commit 'perf/core' into perf/hw-breakpoint Conflicts: kernel/Makefile kernel/trace/Makefile kernel/trace/trace.h samples/Makefile Merge reason: We need to be uptodate with the perf events development branch because we plan to rewrite the breakpoints API on top of perf events.	2009-10-18 01:12:33 +02:00
Ingo Molnar	bb3c3e8071	Merge commit 'v2.6.32-rc5' into perf/probes Conflicts: kernel/trace/trace_event_profile.c Merge reason: update to -rc5 and resolve conflict. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-17 09:58:25 +02:00
David Rientjes	adc1938994	x86: Interleave emulated nodes over physical nodes Add interleaved NUMA emulation support This patch interleaves emulated nodes over the system's physical nodes. This is required for interleave optimizations since mempolicies, for example, operate by iterating over a nodemask and act without knowledge of node distances. It can also be used for testing memory latencies and NUMA bugs in the kernel. There're a couple of ways to do this: - divide the number of emulated nodes by the number of physical nodes and allocate the result on each physical node, or - allocate each successive emulated node on a different physical node until all memory is exhausted. The disadvantage of the first option is, depending on the asymmetry in node capacities of each physical node, emulated nodes may substantially differ in size on a particular physical node compared to another. The disadvantage of the second option is, also depending on the asymmetry in node capacities of each physical node, there may be more emulated nodes allocated on a single physical node as another. This patch implements the second option; we sacrifice the possibility that we may have slightly more emulated nodes on a particular physical node compared to another in lieu of node size asymmetry. [ Note that "node capacity" of a physical node is not only a function of its addressable range, but also is affected by subtracting out the amount of reserved memory over that range. NUMA emulation only deals with available, non-reserved memory quantities. ] We ensure there is at least a minimal amount of available memory allocated to each node. We also make sure that at least this amount of available memory is available in ZONE_DMA32 for any node that includes both ZONE_DMA32 and ZONE_NORMAL. This patch also cleans the emulation code up by no longer passing the statically allocated struct bootnode array among the various functions. This init.data array is not allocated on the stack since it may be very large and thus it may be accessed at file scope. The WARN_ON() for nodes_cover_memory() when faking proximity domains is removed since it relies on successive nodes always having greater start addresses than previous nodes; with interleaving this is no longer always true. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 22:56:46 +02:00
David Rientjes	8716273cae	x86: Export srat physical topology This is the counterpart to "x86: export k8 physical topology" for SRAT. It is not as invasive because the acpi code already seperates node setup into detection and registration steps, with the exception of registering e820 active regions in acpi_numa_memory_affinity_init(). This is now moved to acpi_scan_nodes() if NUMA emulation is disabled or deferred. acpi_numa_init() now returns a value which specifies whether an underlying SRAT was located. If so, that topology can be used by the emulation code to interleave emulated nodes over physical nodes or to register the nodes for ACPI. acpi_get_nodes() may now be used to export the srat physical topology of the machine for NUMA emulation. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 22:56:46 +02:00
David Rientjes	8ee2debce3	x86: Export k8 physical topology To eventually interleave emulated nodes over physical nodes, we need to know the physical topology of the machine without actually registering it. This does the k8 node setup in two parts: detection and registration. NUMA emulation can then used the physical topology detected to setup the address ranges of emulated nodes accordingly. If emulation isn't used, the k8 nodes are registered as normal. Two formals are added to the x86 NUMA setup functions: `acpi' and `k8'. These represent whether ACPI or K8 NUMA has been detected; both cannot be true at the same time. This specifies to the NUMA emulation code whether an underlying physical NUMA topology exists and which interface to use. This patch deals solely with separating the k8 setup path into Northbridge detection and registration steps and leaves the ACPI changes for a subsequent patch. The `acpi' formal is added here, however, to avoid touching all the header files again in the next patch. This approach also ensures emulated nodes will not span physical nodes so the true memory latency is not misrepresented. k8_get_nodes() may now be used to export the k8 physical topology of the machine for NUMA emulation. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 22:56:45 +02:00
David Rientjes	1af5ba514f	x86: Clean up and add missing log levels for k8 Convert all printk's in arch/x86/mm/k8topology_64.c to use pr_info() or pr_err() appropriately. Adds log levels for messages currently lacking them. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Ankita Garg <ankita@in.ibm.com> Cc: Len Brown <len.brown@intel.com> LKML-Reference: <alpine.DEB.1.00.0909251517440.14754@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 22:56:45 +02:00
Brian Gerst	ae24ffe5ec	x86, 64-bit: Move K8 B step iret fixup to fault entry asm Move the handling of truncated %rip from an iret fault to the fault entry path. This allows x86-64 to use the standard search_extable() function. Signed-off-by: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jan Beulich <jbeulich@novell.com> LKML-Reference: <1255357103-5418-1-git-send-email-brgerst@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 18:29:46 +02:00
Joe Perches	3c355863fb	testmmiotrace.c: Add and use pr_fmt(fmt) - Add #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt. - Strip MODULE_NAME from pr_<level>s. - Remove MODULE_NAME definition. Signed-off-by: Joe Perches <joe@perches.com> LKML-Reference: <3bb66cc7f85f77b9416902e1be7076f7e3f4ad48.1254701151.git.joe@perches.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-10-12 08:05:41 +02:00
Linus Torvalds	5bb241b325	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Remove redundant non-NUMA topology functions x86: early_printk: Protect against using the same device twice x86: Reduce verbosity of "PAT enabled" kernel message x86: Reduce verbosity of "TSC is reliable" message x86: mce: Use safer ways to access MCE registers x86: mce, inject: Use real inject-msg in raise_local x86: mce: Fix thermal throttling message storm x86: mce: Clean up thermal throttling state tracking code x86: split NX setup into separate file to limit unstack-protected code xen: check EFER for NX before setting up GDT mapping x86: Cleanup linker script using new linker script macros. x86: Use section .data.page_aligned for the idt_table. x86: convert to use __HEAD and HEAD_TEXT macros. x86: convert compressed loader to use __HEAD and HEAD_TEXT macros. x86: fix fragile computation of vsyscall address	2009-09-26 10:13:35 -07:00
Linus Torvalds	94e0fb086f	Merge branch 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel * 'drm-intel-next' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel: (57 commits) drm/i915: Handle ERESTARTSYS during page fault drm/i915: Warn before mmaping a purgeable buffer. drm/i915: Track purged state. drm/i915: Remove eviction debug spam drm/i915: Immediately discard any backing storage for uneeded objects drm/i915: Do not mis-classify clean objects as purgeable drm/i915: Whitespace correction for madv drm/i915: BUG_ON page refleak during unbind drm/i915: Search harder for a reusable object drm/i915: Clean up evict from list. drm/i915: Add tracepoints drm/i915: framebuffer compression for GM45+ drm/i915: split display functions by chip type drm/i915: Skip the sanity checks if the current relocation is valid drm/i915: Check that the relocation points to within the target drm/i915: correct FBC update when pipe base update occurs drm/i915: blacklist Acer AspireOne lid status ACPI: make ACPI button funcs no-ops if not built in drm/i915: prevent FIFO calculation overflows on 32 bits with high dotclocks drm/i915: intel_display.c handle latency variable efficiently ... Fix up trivial conflicts in drivers/gpu/drm/i915/{i915_dma.c\|i915_drv.h}	2009-09-24 10:30:41 -07:00
Linus Torvalds	db16826367	Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits) HWPOISON: Enable error_remove_page on btrfs HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs HWPOISON: Add madvise() based injector for hardware poisoned pages v4 HWPOISON: Enable error_remove_page for NFS HWPOISON: Enable .remove_error_page for migration aware file systems HWPOISON: The high level memory error handler in the VM v7 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process HWPOISON: shmem: call set_page_dirty() with locked page HWPOISON: Define a new error_remove_page address space op for async truncation HWPOISON: Add invalidate_inode_page HWPOISON: Refactor truncate to allow direct truncating of page v2 HWPOISON: check and isolate corrupted free pages v2 HWPOISON: Handle hardware poisoned pages in try_to_unmap HWPOISON: Use bitmask/action code for try_to_unmap behaviour HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 HWPOISON: Add poison check to page fault handling HWPOISON: Add basic support for poisoned pages in fault handler v3 HWPOISON: Add new SIGBUS error codes for hardware poison signals HWPOISON: Add support for poison swap entries v2 HWPOISON: Export some rmap vma locking to outside world ...	2009-09-24 07:53:22 -07:00
Ingo Molnar	d2ff6de537	Merge branch 'linus' into x86/urgent Merge reason: Queueing up dependent early-printk fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-24 12:59:18 +02:00
Roland Dreier	e23a8b6a8f	x86: Reduce verbosity of "PAT enabled" kernel message On modern systems, the kernel prints the message x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106 once for every CPU. This gets kind of ridiculous on huge systems; for example, on a 64-thread system I was lucky enough to get: dmesg\| grep 'PAT enabled' \| wc 64 704 5174 There is already a BUG() if non-boot CPUs have PAT capabilities that don't match the boot CPU, so just print the message on the boot CPU. (I kept the print after the wrmsrl() that enables PAT, so that the log output continues to mean that the system survived enabling PAT on the boot CPU) Signed-off-by: Roland Dreier <rolandd@cisco.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <adavdj92sso.fsf@cisco.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-24 11:35:19 +02:00
Rusty Russell	78f1c4d6b0	cpumask: use mm_cpumask() wrapper: x86 Makes code futureproof against the impending change to mm->cpu_vm_mask (to be a pointer). It's also a chance to use the new cpumask_ ops which take a pointer (the older ones are deprecated, but there's no hurry for arch code). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-09-24 09:34:52 +09:30
Frederic Weisbecker	d7a4b414ee	Merge commit 'linus/master' into tracing/kprobes Conflicts: kernel/trace/Makefile kernel/trace/trace.h kernel/trace/trace_event_types.h kernel/trace/trace_export.c Merge reason: Sync with latest significant tracing core changes.	2009-09-23 23:08:43 +02:00
KAMEZAWA Hiroyuki	81ac3ad906	kcore: register module area in generic way Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area. This is handled only in x86-64. This patch make it more generic. And we can use vread/vwrite to access the area. Fix it. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:42 -07:00
KAMEZAWA Hiroyuki	3089aa1b0c	kcore: use registerd physmem information For /proc/kcore, each arch registers its memory range by kclist_add(). In usual, - range of physical memory - range of vmalloc area - text, etc... are registered but "range of physical memory" has some troubles. It doesn't updated at memory hotplug and it tend to include unnecessary memory holes. Now, /proc/iomem (kernel/resource.c) includes required physical memory range information and it's properly updated at memory hotplug. Then, it's good to avoid using its own code(duplicating information) and to rebuild kclist for physical memory based on /proc/iomem. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	9492587cf3	kcore: register text area in generic way Some 64bit arch has special segment for mapping kernel text. It should be entried to /proc/kcore in addtion to direct-linear-map, vmalloc area. This patch unifies KCORE_TEXT entry scattered under x86 and ia64. I'm not familiar with other archs (mips has its own even after this patch) but range of [_stext ..._end) is a valid area of text and it's not in direct-map area, defining CONFIG_ARCH_PROC_KCORE_TEXT is only a necessary thing to do. Note: I left mips as it is now. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	a0614da88b	kcore: register vmalloc area in generic way For /proc/kcore, vmalloc areas are registered per arch. But, all of them registers same range of [VMALLOC_START...VMALLOC_END) This patch unifies them. By this. archs which have no kclist_add() hooks can see vmalloc area correctly. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
KAMEZAWA Hiroyuki	c30bb2a25f	kcore: add kclist types Presently, kclist_add() only eats start address and size as its arguments. Considering to make kclist dynamically reconfigulable, it's necessary to know which kclists are for System RAM and which are not. This patch add kclist types as KCORE_RAM KCORE_VMALLOC KCORE_TEXT KCORE_OTHER This "type" is used in a patch following this for detecting KCORE_RAM. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-23 07:39:41 -07:00
Ingo Molnar	14c93e8eba	Merge branch 'bugfix' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/urgent	2009-09-23 14:35:10 +02:00
Linus Torvalds	991d79b0d1	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vegard/kmemcheck: kmemcheck: add missing braces to do-while in kmemcheck_annotate_bitfield kmemcheck: update documentation kmemcheck: depend on HAVE_ARCH_KMEMCHECK kmemcheck: remove useless check kmemcheck: remove duplicated #include	2009-09-22 08:07:54 -07:00
Jan Beulich	3c1596efe1	mm: don't use alloc_bootmem_low() where not strictly needed Since alloc_bootmem() will never return inaccessible (via virtual addressing) memory anyway, using the ..._low() variant only makes sense when the physical address range of the allocated memory must fulfill further constraints, espacially since on 64-bits (or more generally in all cases where the pools the two variants allocate from are than the full available range. Probably the use in alloc_tce_table() could also be eliminated (based on code inspection of pci-calgary_64.c), but that seems too risky given I know nothing about that hardware and have no way to test it. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:38 -07:00
Geert Uytterhoeven	cc013a8890	arches: drop superfluous casts in nr_free_pages() callers Commit `9617729941` ("Drop free_pages()") modified nr_free_pages() to return 'unsigned long' instead of 'unsigned int'. This made the casts to 'unsigned long' in most callers superfluous, so remove them. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kyle McMartin <kyle@mcmartin.ca> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Haavard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Chris Zankel <zankel@tensilica.com> Cc: Michal Simek <monstr@monstr.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-09-22 07:17:34 -07:00
Jeremy Fitzhardinge	c44c9ec0f3	x86: split NX setup into separate file to limit unstack-protected code Move the NX setup into a separate file so that it can be compiled without stack-protection while leaving the rest of the mm/init code protected. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-21 13:56:58 -07:00
Jeremy Fitzhardinge	b75fe4e5b8	xen: check EFER for NX before setting up GDT mapping x86-64 assumes NX is available by default, so we need to explicitly check for it before using NX. Some first-generation Intel x86-64 processors didn't support NX, and even recent systems allow it to be disabled in BIOS. [ Impact: prevent Xen crash on NX-less 64-bit machines ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org>	2009-09-21 13:49:43 -07:00
Ingo Molnar	cdd6c482c9	perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f \| grep -vE 'oprofile\|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N \| sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-21 14:28:04 +02:00
Jaswinder Singh Rajput	fcf9892161	includecheck fix: x86, shadow.c fix the following 'make includecheck' warning: arch/x86/mm/kmemcheck/shadow.c: linux/module.h is included more than once. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <1247065179.4382.51.camel@ht.satnam>	2009-09-20 16:00:38 +05:30
Linus Torvalds	ca043a66ae	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: don't use rb-tree based lookup in reserve_memtype() x86: Increase MIN_GAP to include randomized stack	2009-09-17 20:58:11 -07:00
H. Peter Anvin	3bb045f1e2	Merge branch 'x86/pat' into x86/urgent Merge reason: Suresh Siddha (1): x86, pat: don't use rb-tree based lookup in reserve_memtype() ... requires previous x86/pat commits already pushed to Linus. Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-17 14:40:49 -07:00
Suresh Siddha	dcb73bf402	x86, pat: don't use rb-tree based lookup in reserve_memtype() Recent enhancement of rb-tree based lookup exposed a bug with the lookup mechanism in the reserve_memtype() which ensures that there are no conflicting memtype requests for the memory range. memtype_rb_search() returns an entry which has a start address <= new start address. And from here we traverse the linear linked list to check if there any conflicts with the existing mappings. As the rbtree is based on the start address of the memory range, it is quite possible that we have several overlapped mappings whose start address is much less than new requested start but the end is >= new requested end. This results in conflicting memtype mappings. Same bug exists with the old code which uses cached_entry from where we traverse the linear linked list. But the new rb-tree code exposes this bug fairly easily. For now, don't use the memtype_rb_search() and always start the search from the head of linear linked list in reserve_memtype(). Linear linked list for most of the systems grow's to few 10's of entries(as we track memory type of RAM pages using struct page). So we should be ok for now. We still retain the rbtree and use it to speed up free_memtype() which doesn't have the same bug(as we know what exactly we are searching for in free_memtype). Also use list_for_each_entry_from() in free_memtype() so that we start the search from rb-tree lookup result. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <1253136483.4119.12.camel@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-09-17 14:07:58 -07:00
Andi Kleen	a6e04aa929	HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 Add VM_FAULT_HWPOISON handling to the x86 page fault handler. This is very similar to VM_FAULT_OOM, the only difference is that a different si_code is passed to user space and the new addr_lsb field is initialized. v2: Make the printk more verbose/unique Cc: x86@kernel.org Signed-off-by: Andi Kleen <ak@linux.intel.com>	2009-09-16 11:50:09 +02:00
Linus Torvalds	ada3fa1505	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c	2009-09-15 09:39:44 -07:00
Linus Torvalds	227423904c	Merge branch 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, pat: Fix cacheflush address in change_page_attr_set_clr() mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set x86: Fix earlyprintk=dbgp for machines without NX x86, pat: Sanity check remap_pfn_range for RAM region x86, pat: Lookup the protection from memtype list on vm_insert_pfn() x86, pat: Add lookup_memtype to get the current memtype of a paddr x86, pat: Use page flags to track memtypes of RAM pages x86, pat: Generalize the use of page flag PG_uncached x86, pat: Add rbtree to do quick lookup in memtype tracking x86, pat: Add PAT reserve free to io_mapping* APIs x86, pat: New i/f for driver to request memtype for IO regions x86, pat: ioremap to follow same PAT restrictions as other PAT users x86, pat: Keep identity maps consistent with mmaps even when pat_disabled x86, mtrr: make mtrr_aps_delayed_init static bool x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled x86: Fix an incorrect argument of reserve_bootmem() x86: Fix system crash when loading with "reservetop" parameter	2009-09-15 09:19:38 -07:00
Ingo Molnar	dca2d6ac09	Merge branch 'linus' into tracing/hw-breakpoints Conflicts: arch/x86/kernel/process_64.c Semantic conflict fixed in: arch/x86/kvm/x86.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-15 12:18:15 +02:00
Linus Torvalds	69def9f05d	Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm * 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits) MAINTAINERS: update KVM entry KVM: correct error-handling code KVM: fix compile warnings on s390 KVM: VMX: Check cpl before emulating debug register access KVM: fix misreporting of coalesced interrupts by kvm tracer KVM: x86: drop duplicate kvm_flush_remote_tlb calls KVM: VMX: call vmx_load_host_state() only if msr is cached KVM: VMX: Conditionally reload debug register 6 KVM: Use thread debug register storage instead of kvm specific data KVM guest: do not batch pte updates from interrupt context KVM: Fix coalesced interrupt reporting in IOAPIC KVM guest: fix bogus wallclock physical address calculation KVM: VMX: Fix cr8 exiting control clobbering by EPT KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp KVM: Document KVM_CAP_IRQCHIP KVM: Protect update_cr8_intercept() when running without an apic KVM: VMX: Fix EPT with WP bit change during paging KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors KVM: x86 emulator: Add adc and sbb missing decoder flags KVM: Add missing #include ...	2009-09-14 17:43:43 -07:00
Linus Torvalds	b8cb48aae1	Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: split __phys_addr out into separate file xen: use stronger barrier after unlocking lock xen: only enable interrupts while actually blocking for spinlock xen: make -fstack-protector work under Xen	2009-09-14 10:23:49 -07:00
Linus Torvalds	7dfd54a905	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, highmem_32.c: Clean up comment x86, pgtable.h: Clean up types x86: Clean up dump_pagetable()	2009-09-14 07:59:32 -07:00
Linus Torvalds	8fafa0a789	Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Decrease the level of some NUMA messages to KERN_DEBUG	2009-09-14 07:57:50 -07:00
Linus Torvalds	15b0404272	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Make memtype_seq_ops const x86: uv: Clean up uv_ptc_init(), use proc_create() x86: Use printk_once() x86/cpu: Clean up various files a bit x86: Remove duplicated #include x86, ipi: Clean up safe_smp_processor_id() by using the cpu_has_apic() macro helper x86: Clean up idt_descr and idt_tableby using NR_VECTORS instead of hardcoded number x86: Further clean up of mtrr/generic.c x86: Clean up mtrr/main.c x86: Clean up mtrr/state.c x86: Clean up mtrr/mtrr.h x86: Clean up mtrr/if.c x86: Clean up mtrr/generic.c x86: Clean up mtrr/cyrix.c x86: Clean up mtrr/cleanup.c x86: Clean up mtrr/centaur.c x86: Clean up mtrr/amd.c: x86: ds.c fix invalid assignment	2009-09-14 07:56:43 -07:00
Eric Anholt	e517a5e970	agp/intel: Fix the pre-9xx chipset flush. Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had serious stability issues. Back in May a wbinvd was added to the DRM to work around much of the problem. Some failure remained -- easily visible by dragging a window around on an X -retro desktop, or by looking at bugzilla. The chipset flush was on the right track -- hitting the right amount of memory, and it appears to be the only way to flush on these chipsets, but the flush page was mapped uncached. As a result, the writes trying to clear the writeback cache ended up bypassing the cache, and not flushing anything! The wbinvd would flush out other writeback data and often cause the data we wanted to get flushed, but not always. By removing the setting of the page to UC and instead just clflushing the data we write to try to flush it, we get the desired behavior with no wbinvd. This exports clflush_cache_range(), which was laying around and happened to basically match the code I was otherwise going to copy from the DRM. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Brice Goglin <Brice.Goglin@ens-lyon.org> Cc: stable@kernel.org	2009-09-11 11:39:23 -07:00
Michal Hocko	80938332d8	x86: Increase MIN_GAP to include randomized stack Currently we are not including randomized stack size when calculating mmap_base address in arch_pick_mmap_layout for topdown case. This might cause that mmap_base starts in the stack reserved area because stack is randomized by 1GB for 64b (8MB for 32b) and the minimum gap is 128MB. If the stack really grows down to mmap_base then we can get silent mmap region overwrite by the stack values. Let's include maximum stack randomization size into MIN_GAP which is used as the low bound for the gap in mmap. Signed-off-by: Michal Hocko <mhocko@suse.cz> LKML-Reference: <1252400515-6866-1-git-send-email-mhocko@suse.cz> Acked-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Stable Team <stable@kernel.org>	2009-09-10 17:00:12 -07:00
Frederic Weisbecker	8f8ffe2485	Merge commit 'tracing/core' into tracing/kprobes Conflicts: kernel/trace/trace_export.c kernel/trace/trace_kprobe.c Merge reason: This topic branch lacks an important build fix in tracing/core: `0dd7b74787`: tracing: Fix double CPP substitution in TRACE_EVENT_FN that prevents from multiple tracepoint headers inclusion crashes. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-09-11 01:09:23 +02:00
Jeremy Fitzhardinge	78c86e5e56	x86: split __phys_addr out into separate file Split __phys_addr out into its own file so we can disable -fstack-protector in a fine-grained fashion. Also it doesn't have terribly much to do with the rest of ioremap.c. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-10 11:48:55 -07:00
Avi Kivity	256cd2ef4f	x86: Export kmap_atomic_to_page() Needed by KVM. Signed-off-by: Avi Kivity <avi@redhat.com>	2009-09-10 08:33:22 +03:00
Jeremy Fitzhardinge	577eebeae3	xen: make -fstack-protector work under Xen -fstack-protector uses a special per-cpu "stack canary" value. gcc generates special code in each function to test the canary to make sure that the function's stack hasn't been overrun. On x86-64, this is simply an offset of %gs, which is the usual per-cpu base segment register, so setting it up simply requires loading %gs's base as normal. On i386, the stack protector segment is %gs (rather than the usual kernel percpu %fs segment register). This requires setting up the full kernel GDT and then loading %gs accordingly. We also need to make sure %gs is initialized when bringing up secondary cpus too. To keep things consistent, we do the full GDT/segment register setup on both architectures. Because we need to avoid -fstack-protected code before setting up the GDT and because there's no way to disable it on a per-function basis, several files need to have stack-protector inhibited. [ Impact: allow Xen booting with stack-protector enabled ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>	2009-09-09 16:37:39 -07:00
Jack Steiner	fa526d0d64	x86, pat: Fix cacheflush address in change_page_attr_set_clr() Fix address passed to cpa_flush_range() when changing page attributes from WB to UC. The address (*addr) is modified by __change_page_attr_set_clr(). The result is that the pages being flushed start at the _end_ of the changed range instead of the beginning. This should be considered for 2.6.30-stable and 2.6.31-stable. Signed-off-by: Jack Steiner <steiner@sgi.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: Stable team <stable@kernel.org>	2009-09-09 14:05:24 -07:00
Ingo Molnar	a1922ed661	Merge branch 'tracing/core' into tracing/hw-breakpoints Conflicts: arch/Kconfig kernel/trace/trace.h Merge reason: resolve the conflicts, plus adopt to the new ring-buffer APIs. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-07 08:19:51 +02:00
Tobias Klauser	d535e4319a	x86: Make memtype_seq_ops const Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-06 06:33:33 +02:00
Rafael J. Wysocki	23b6c52cf5	x86: Decrease the level of some NUMA messages to KERN_DEBUG Some NUMA messages in srat_32.c are confusing to users, because they seem to indicate errors, while in fact they reflect normal behaviour. Decrease the level of these messages to KERN_DEBUG so that they don't show up unnecessarily. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> LKML-Reference: <200909050107.45175.rjw@sisk.pl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-09-06 06:32:23 +02:00
Pekka Enberg	8e019366ba	kmemleak: Don't scan uninitialized memory when kmemcheck is enabled Ingo Molnar reported the following kmemcheck warning when running both kmemleak and kmemcheck enabled: PM: Adding info for No Bus:vcsa7 WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (f6f6e1a4) d873f9f600000000c42ae4c1005c87f70000000070665f666978656400000000 i i i i u u u u i i i i i i i i i i i i i i i i i i i i i u u u ^ Pid: 3091, comm: kmemleak Not tainted (2.6.31-rc7-tip #1303) P4DC6 EIP: 0060:[<c110301f>] EFLAGS: 00010006 CPU: 0 EIP is at scan_block+0x3f/0xe0 EAX: f40bd700 EBX: f40bd780 ECX: f16b46c0 EDX: 00000001 ESI: f6f6e1a4 EDI: 00000000 EBP: f10f3f4c ESP: c2605fcc DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: e89a4844 CR3: 30ff1000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff4ff0 DR7: 00000400 [<c110313c>] scan_object+0x7c/0xf0 [<c1103389>] kmemleak_scan+0x1d9/0x400 [<c1103a3c>] kmemleak_scan_thread+0x4c/0xb0 [<c10819d4>] kthread+0x74/0x80 [<c10257db>] kernel_thread_helper+0x7/0x3c [<ffffffff>] 0xffffffff kmemleak: 515 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 42 new suspected memory leaks (see /sys/kernel/debug/kmemleak) The problem here is that kmemleak will scan partially initialized objects that makes kmemcheck complain. Fix that up by skipping uninitialized memory regions when kmemcheck is enabled. Reported-by: Ingo Molnar <mingo@elte.hu> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-09-04 16:05:55 +01:00
Masami Hiramatsu	62c9295f9d	kprobes/x86: Fix to add __kprobes to in-kernel fault handing functions Add __kprobes to the functions which handle in-kernel fixable page faults. Since kprobes can cause those in-kernel page faults by accessing kprobe data structures, probing those fault functions will cause fault-int3-loop (do_page_fault has already been marked as __kprobes). Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20090827172311.8246.92725.stgit@localhost.localdomain> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-08-30 03:08:26 +02:00
H. Peter Anvin	b855192c08	Merge branch 'x86/urgent' into x86/pat Reason: Change to is_new_memtype_allowed() in x86/urgent Resolved semantic conflicts in: arch/x86/mm/pat.c arch/x86/mm/ioremap.c Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 17:24:28 -07:00
Venkatesh Pallipadi	d886c73cd4	x86, pat: Sanity check remap_pfn_range for RAM region Add sanity check for remap_pfn_range of RAM regions using lookup_memtype(). Previously, we did not have anyway to get the type of RAM memory regions as they were tracked using a single bit in page_struct (WB, nonWB). Now we can get the actual type from page struct (WB, WC, UC_MINUS) and make sure the requester gets that type. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:35 -07:00
Venkatesh Pallipadi	1087637616	x86, pat: Lookup the protection from memtype list on vm_insert_pfn() Lookup the reserved memtype during vm_insert_pfn and use that memtype for the new mapping. This takes care or handling of vm_insert_pfn() interface in track_pfn_vma*/untrack_pfn_vma. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:32 -07:00
Venkatesh Pallipadi	637b86e75f	x86, pat: Add lookup_memtype to get the current memtype of a paddr Add a new routine lookup_memtype() to get the current memtype based on the PAT reserves and frees. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:28 -07:00
Venkatesh Pallipadi	f584174096	x86, pat: Use page flags to track memtypes of RAM pages Change reserve_ram_pages_type and free_ram_pages_type to use 2 page flags to track UC_MINUS, WC, WB and default types. Previous RAM tracking just tracked WB or NonWB, which was not complete and did not allow tracking of RAM fully and there was no way to get the actual type reserved by looking at the page flags. We use the memtype_lock spinlock for atomicity in dealing with memtype tracking in struct page. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:24 -07:00
Venkatesh Pallipadi	335ef896d4	x86, pat: Add rbtree to do quick lookup in memtype tracking PAT memtype tracking uses a linear link list to keep track of IO (non-RAM) regions and their memtypes. The code used a last_accessed pointer as a cache to speedup the lookup. As per discussions with H. Peter Anvin a while back, having a rbtree here will avoid bad performances in pathological cases where we may end up with huge linked list. This may not add any noticable performance speedup in normal case as the number of entires in PAT memtype list tend to be ~20-30 range. The patch removes the "cached_entry" logic as with rbtree we have more generic way of speeding up the lookup. With this patch, we use rbtree to do the quick lookup. We still use linked list as the memtype range tracked can be of different sizes and can overlap in different ways. We also keep track of usage counts with linked list. Example: Multiple ioremaps with different sizes uncached-minus @ 0xfffff00000-0xfffff04000 uncached-minus @ 0xfffff02000-0xfffff03000 And one userlevel mmap and the thread forks a new process uncached-minus @ 0xbf453000-0xbf454000 uncached-minus @ 0xbf453000-0xbf454000 Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:19 -07:00
Venkatesh Pallipadi	9e36fda0b3	x86, pat: Add PAT reserve free to io_mapping* APIs io_mapping_* interfaces were added, mainly for graphics drivers. Make this interface go through the PAT reserve/free, instead of hardcoding WC mapping. This makes sure that there are no aliases due to unconditional WC setting. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:16 -07:00
Venkatesh Pallipadi	9fd126bc74	x86, pat: New i/f for driver to request memtype for IO regions Add new routines to request memtype for IO regions. This will currently be a backend for io_mapping_* routines. But, it can also be made available to drivers directly in future, in case it is needed. reserve interface reserves the memory, makes sure we have a compatible memory type available and keeps the identity map in sync when needed. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:10 -07:00
Venkatesh Pallipadi	279e669b3f	x86, pat: ioremap to follow same PAT restrictions as other PAT users ioremap has this hard-coded check for new type and requested type. That check differs from other PAT users like /dev/mem mmap, remap_pfn_range in only one condition where requested type is UC_MINUS and new type is WC. Under that condition, ioremap fails. But other PAT interfaces succeed with a WC mapping. Change to make ioremap be in sync with other PAT APIs and use the same macro as others. Also changes the error print to KERN_ERR instead of pr_debug. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:41:07 -07:00
Venkatesh Pallipadi	5fc517466d	x86, pat: Keep identity maps consistent with mmaps even when pat_disabled Make reserve_memtype internally take care of pat disabled case and fallback to default return values. Remove the specific pat_disabled checks in track_* routines. Change kernel_map_sync_memtype to sync identity map even when pat_disabled. This change ensures that, even for pat_disabled case, we take care of keeping identity map in sync. Before this patch, in pat disabled case, ioremap() keeps the identity maps in sync and other APIs like pci and /dev/mem mmap don't, which is not a very consistent behavior. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-26 15:40:58 -07:00
Linus Torvalds	9f459fadbb	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Fix build with older binutils and consolidate linker script x86: Fix an incorrect argument of reserve_bootmem() x86: add vmlinux.lds to targets in arch/x86/boot/compressed/Makefile xen: rearrange things to fix stackprotector x86: make sure load_percpu_segment has no stackprotector i386: Fix section mismatches for init code with !HOTPLUG_CPU x86, pat: Allow ISA memory range uncacheable mapping requests	2009-08-25 11:23:25 -07:00
Amerigo Wang	a6a06f7b57	x86: Fix an incorrect argument of reserve_bootmem() This line looks suspicious, because if this is true, then the 'flags' parameter of function reserve_bootmem_generic() will be unused when !CONFIG_NUMA. I don't think this is what we want. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-24 20:22:55 +02:00
Linus Torvalds	b04e6373d6	x86: don't call '->send_IPI_mask()' with an empty mask As noted in `83d349f35e` ("x86: don't send an IPI to the empty set of CPU's"), some APIC's will be very unhappy with an empty destination mask. That commit added a WARN_ON() for that case, and avoided the resulting problem, but didn't fix the underlying reason for why those empty mask cases happened. This fixes that, by checking the result of 'cpumask_andnot()' of the current CPU actually has any other CPU's left in the set of CPU's to be sent a TLB flush, and not calling down to the IPI code if the mask is empty. The reason this started happening at all is that we started passing just the CPU mask pointers around in commit `4595f9620` ("x86: change flush_tlb_others to take a const struct cpumask"), and when we did that, the cpumask was no longer thread-local. Before that commit, flush_tlb_mm() used to create it's own copy of 'mm->cpu_vm_mask' and pass that copy down to the low-level flush routines after having tested that it was not empty. But after changing it to just pass down the CPU mask pointer, the lower level TLB flush routines would now get a pointer to that 'mm->cpu_vm_mask', and that could still change - and become empty - after the test due to other CPU's having flushed their own TLB's. See http://bugzilla.kernel.org/show_bug.cgi?id=13933 for details. Tested-by: Thomas Björnell <thomas.bjornell@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-08-21 09:48:10 -07:00
Amerigo Wang	3e0e1e9c5a	x86: Fix an incorrect argument of reserve_bootmem() This line looks suspicious, because if this is true, then the 'flags' parameter of function reserve_bootmem_generic() will be unused when !CONFIG_NUMA. I don't think this is what we want. Signed-off-by: WANG Cong <amwang@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: akpm@linux-foundation.org LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-21 16:40:31 +02:00
Suresh Siddha	1adcaafe74	x86, pat: Allow ISA memory range uncacheable mapping requests Max Vozeler reported: > Bug 13877 - bogl-term broken with CONFIG_X86_PAT=y, works with =n > > strace of bogl-term: > 814 mmap2(NULL, 65536, PROT_READ\|PROT_WRITE, MAP_SHARED, 4, 0) > = -1 EAGAIN (Resource temporarily unavailable) > 814 write(2, "bogl: mmaping /dev/fb0: Resource temporarily unavailable\n", > 57) = 57 PAT code maps the ISA memory range as WB in the PAT attribute, so that fixed range MTRR registers define the actual memory type (UC/WC/WT etc). But the upper level is_new_memtype_allowed() API checks are failing, as the request here is for UC and the return tracked type is WB (Tracked type is WB as MTRR type for this legacy range potentially will be different for each 4k page). Fix is_new_memtype_allowed() by always succeeding the ISA address range checks, as the null PAT (WB) and def MTRR fixed range register settings satisfy the memory type needs of the applications that map the ISA address range. Reported-and-Tested-by: Max Vozeler <xam@debian.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-08-17 14:12:44 -07:00
Tejun Heo	e933a73f48	percpu: kill lpage first chunk allocator With x86 converted to embedding allocator, lpage doesn't have any user left. Kill it along with cpa handling code. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com>	2009-08-14 15:00:53 +09:00
Tejun Heo	384be2b18a	Merge branch 'percpu-for-linus' into percpu-for-next Conflicts: arch/sparc/kernel/smp_64.c arch/x86/kernel/cpu/perf_counter.c arch/x86/kernel/setup_percpu.c drivers/cpufreq/cpufreq_ondemand.c mm/percpu.c Conflicts in core and arch percpu codes are mostly from commit ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all the first chunk allocators into mm/percpu.c, the changes are moved from arch code to mm/percpu.c. Signed-off-by: Tejun Heo <tj@kernel.org>	2009-08-14 14:45:31 +09:00
Linus Torvalds	067e18133f	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Work around compilation warning in arch/x86/kernel/apm_32.c x86, UV: Complete IRQ interrupt migration in arch_enable_uv_irq() x86, 32-bit: Fix double accounting in reserve_top_address() x86: Don't use current_cpu_data in x2apic phys_pkg_id x86, UV: Fix UV apic mode x86, UV: Fix macros for accessing large node numbers x86, UV: Delete mapping of MMR rangs mapped by BIOS x86, UV: Handle missing blade-local memory correctly x86: fix assembly constraints in native_save_fl() x86, msr: execute on the correct CPU subset x86: Fix assert syntax in vmlinux.lds.S x86: Make 64-bit efi_ioremap use ioremap on MMIO regions x86: Add quirk to make Apple MacBook5,2 use reboot=pci x86: Fix CPA memtype reserving in the set_pages_array*() cases x86, pat: Fix set_memory_wc related corruption x86: fix section mismatch for i386 init code	2009-08-04 15:28:59 -07:00
Jan Beulich	6abf655109	x86, 32-bit: Fix double accounting in reserve_top_address() With VMALLOC_END included in the calculation of MAXMEM (as of 2.6.28) it is no longer correct to also bump __VMALLOC_RESERVE in reserve_top_address(). Doing so results in needlessly small lowmem. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4A71DD2A020000780000D482@vpn.id2.novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-04 16:27:29 +02:00
Thomas Hellstrom	8523acfe40	x86: Fix CPA memtype reserving in the set_pages_array*() cases The code was incorrectly reserving memtypes using the page virtual address instead of the physical address. Furthermore, the code was not ignoring highmem pages as it ought to. ( upstream does not pass in highmem pages yet - but upcoming graphics code will do it and there's no reason to not handle this properly in the CPA APIs.) Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=13884 Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: <stable@kernel.org> Cc: dri-devel@lists.sourceforge.net Cc: venkatesh.pallipadi@intel.com LKML-Reference: <1249284345-7654-1-git-send-email-thellstrom@vmware.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-08-03 19:36:09 +02:00
Pallipadi, Venkatesh	bdc6340f4e	x86, pat: Fix set_memory_wc related corruption Changeset `3869c4aa18` that went in after 2.6.30-rc1 was a seemingly small change to _set_memory_wc() to make it complaint with SDM requirements. But, introduced a nasty bug, which can result in crash and/or strange corruptions when set_memory_wc is used. One such crash reported here http://lkml.org/lkml/2009/7/30/94 Actually, that changeset introduced two bugs. * change_page_attr_set() takes &addr as first argument and can the addr value might have changed on return, even for single page change_page_attr_set() call. That will make the second change_page_attr_set() in this routine operate on unrelated addr, that can eventually cause strange corruptions and bad page state crash. * The second change_page_attr_set() call, before setting _PAGE_CACHE_WC, should clear the earlier _PAGE_CACHE_UC_MINUS, as otherwise cache attribute will not be WC (will be UC instead). The patch below fixes both these problems. Sending a single patch to fix both the problems, as the change is to the same line of code. The change to have a addr_copy is not very clean. But, it is simpler than making more changes through various routines in pageattr.c. A huge thanks to Jerome for reporting this problem and providing a simple test case that helped us root cause the problem. Reported-by: Jerome Glisse <glisse@freedesktop.org> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090730214319.GA1889@linux-os.sc.intel.com> Acked-by: Dave Airlie <airlied@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-07-30 17:48:34 -07:00
Linus Torvalds	84210aeb4a	Merge branch 'drm-radeon-kms' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-radeon-kms' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (35 commits) drm/radeon: set fb aperture sizes for framebuffer handoff. drm/ttm: fix highuser vs dma32 confusion. drm/radeon: Fix size used for benchmarking BO copies. drm/radeon: Add radeon.test parameter for running BO GPU copy tests. drm/radeon/kms: allow interruptible waits for objects. drm/ttm: powerpc: Fix Highmem cache flushing. x86: Export kmap_atomic_prot() needed for TTM. drm/ttm: Fix ttm in-kernel copying of pages with non-standard caching attributes. drm/ttm: Fix an oops and sync object leak. drm/radeon/kms: vram sizing on certain r100 chips needs workaround. drm/radeon: Pay more attention to object placement requested by userspace. drm/radeon: Fall back to evicting BOs with memcpy if necessary. drm/radeon: Don't unreserve twice on failure to validate. drm/radeon/kms: fix bandwidth computation on avivo hardware drm/radeon/kms: add initial colortiling support. drm/radeon/kms: fix hotspot handling on pre-avivo chips drm/radeon/kms: enable frac fb divs on rs600/rs690/rs740 drm/radeon/kms: add PLL flag to prefer frequencies <= the target freq drm/radeon/kms: block RN50 from using 3D engine. drm/radeon/kms: fix VRAM sizing like DDX does it. ...	2009-07-29 12:31:59 -07:00
Thomas Hellstrom	73ba651fc2	x86: Export kmap_atomic_prot() needed for TTM. This functionality is needed to kmap_atomic() highmem pages that may potentially have or are about to set up other mappings with non-standard caching attributes. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2009-07-29 15:56:22 +10:00
Linus Torvalds	ca597a02cd	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure x86, amd: Don't probe for extended APIC ID if APICs are disabled x86, mce: Rename incorrect macro name "CONFIG_X86_THRESHOLD" x86-64: Fix bad_srat() to clear all state x86, mce: Fix set_trigger() accessor x86: Fix movq immediate operand constraints in uaccess.h x86: Fix movq immediate operand constraints in uaccess_64.h x86: Add reboot fixup for SBC-fitPC2 x86: Include all of .data.* sections in _edata on 64-bit x86: Add quirk for Intel DG45ID board to avoid low memory corruption	2009-07-27 12:18:09 -07:00
Benjamin Herrenschmidt	9e1b32caa5	mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() Upcoming paches to support the new 64-bit "BookE" powerpc architecture will need to have the virtual address corresponding to PTE page when freeing it, due to the way the HW table walker works. Basically, the TLB can be loaded with "large" pages that cover the whole virtual space (well, sort-of, half of it actually) represented by a PTE page, and which contain an "indirect" bit indicating that this TLB entry RPN points to an array of PTEs from which the TLB can then create direct entries. Thus, in order to invalidate those when PTE pages are deleted, we need the virtual address to pass to tlbilx or tlbivax instructions. The old trick of sticking it somewhere in the PTE page struct page sucks too much, the address is almost readily available in all call sites and almost everybody implemets these as macros, so we may as well add the argument everywhere. I added it to the pmd and pud variants for consistency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV] Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-27 12:10:38 -07:00
Andi Kleen	429b2b319a	x86-64: Fix bad_srat() to clear all state Need to clear both nodes and nodes_add state for start/end. Signed-off-by: Andi Kleen <ak@linux.intel.com> LKML-Reference: <20090718065657.GA2898@basil.fritz.box> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org	2009-07-21 15:20:01 -07:00
Roland Dreier	a1a08d1cb0	x86: Remove spurious printk level from segfault message Since commit `5fd29d6c` ("printk: clean up handling of log-levels and newlines"), the kernel logs segfaults like: <6>gnome-power-man[24509]: segfault at 20 ip 00007f9d4950465a sp 00007fffbb50fc70 error 4 in libgobject-2.0.so.0.2103.0[7f9d494f7000+45000] with the extra "<6>" being KERN_INFO. This happens because the printk in show_signal_msg() started with KERN_CONT and then used "%s" to pass in the real level; and KERN_CONT is no longer an empty string, and printk only pays attention to the level at the very beginning of the format string. Therefore, remove the KERN_CONT from this printk, since it is now actively causing problems (and never really made any sense). Signed-off-by: Roland Dreier <roland@digitalvampire.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <874otjitkj.fsf@shaolin.home.digitalvampire.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-11 09:56:19 +02:00
Yinghai Lu	44b5728095	x86: don't clear nodes_states[N_NORMAL_MEMORY] when numa is not compiled in Alex found that specjbb2005 still can not run with hugepages on an x86-64 machine. This only happens when numa is not compiled in. The root cause: node_set_state will not set it back for us in that case, so don't clear that when numa is not select in config [ v2: use node_clear_state instead ] Reported-and-Tested-by: Alex Shi <alex.shi@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 10:32:50 -07:00
Joe Perches	ad361c9884	Remove multiple KERN_ prefixes from printk formats Commit `5fd29d6ccb` ("printk: clean up handling of log-levels and newlines") changed printk semantics. printk lines with multiple KERN_<level> prefixes are no longer emitted as before the patch. <level> is now included in the output on each additional use. Remove all uses of multiple KERN_<level>s in formats. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-07-08 10:30:03 -07:00
Linus Torvalds	faf80d62e4	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix usage of bios intcall() x86: Remove unused function lapic_watchdog_ok() x86: Remove unused variable disable_x2apic x86, kvm: Fix section mismatches in kvm.c x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user x86: Fix fixmap page order for FIX_TEXT_POKE0,1 amd-iommu: set evt_buf_size correctly amd-iommu: handle alias entries correctly in init code x86: Fix printk call in print_local_apic() x86: Declare check_efer() before it gets used x86: Mark device_nb as static and fix NULL noise x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1 xen: Use kcalloc() in xen_init_IRQ() x86: Fix fixmap ordering x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c	2009-07-06 17:45:44 -07:00
Tejun Heo	8c4bfc6e88	x86,percpu: generalize lpage first chunk allocator Generalize and move x86 setup_pcpu_lpage() into pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free/map memory, pcpu_lpage_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, factor out pcpu_calc_fc_sizes() which is common to pcpu_embed_first_chunk() and pcpu_lpage_first_chunk(). [ Impact: code reorganization and generalization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-07-04 08:10:59 +09:00
Vegard Nossum	414f3251aa	kmemcheck: remove useless check This check is a left-over from ancient times. We now have the equivalent check much earlier in both the page fault handler and the debug trap handler (the calls to kmemcheck_active()). Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-07-01 22:28:42 +02:00
Huang Weiyi	a4a874a906	kmemcheck: remove duplicated #include Remove duplicated #include in arch/x86/mm/kmemcheck/shadow.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-07-01 22:28:27 +02:00
Jaswinder Singh Rajput	76c06927f2	x86: Declare check_efer() before it gets used This sparse warning: arch/x86/mm/init.c:83:16: warning: symbol 'check_efer' was not declared. Should it be static? triggers because check_efer() is not decalared before using it. asm/proto.h includes the declaration of check_efer(), so including asm/proto.h to fix that - this also addresses the sparse warning. Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <1246458263.6940.22.camel@hpdv5.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-07-01 16:52:54 +02:00
Yinghai Lu	66918dcdf9	x86: only clear node_states for 64bit Nathan reported that \| commit `73d60b7f74` \| Author: Yinghai Lu <yinghai@kernel.org> \| Date: Tue Jun 16 15:33:00 2009 -0700 \| \| page-allocator: clear N_HIGH_MEMORY map before we set it again \| \| SRAT tables may contains nodes of very small size. The arch code may \| decide to not activate such a node. However, currently the early boot \| code sets N_HIGH_MEMORY for such nodes. These nodes therefore seem to be \| active although these nodes have no present pages. \| \| For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too unintentionally and incorrectly clears the cpuset.mems cgroup attribute on an i386 kvm guest, meaning that cpuset.mems can not be used. Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only. and need to do save/restore for that in find_zone_movable_pfn Reported-by: Nathan Lynch <ntl@pobox.com> Tested-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-30 18:56:01 -07:00
Figo.zhang	565b0c1f10	x86, highmem_32.c: Clean up comment Signed-off-by: Figo.zhang <figo1802@gmail.com> Cc: Andrew Morton <akpm@osdl.org> LKML-Reference: <1246248175.5759.12.camel@myhost> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-29 06:14:43 +02:00
Akinobu Mita	087975b06b	x86: Clean up dump_pagetable() Use pgtable access helpers for 32-bit version dump_pagetable() and get rid of __typeof__() operators. This needs to make pmd_pfn() available for 2-level pgtable. Also, remove some casts for 64-bit version dump_pagetable(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090627063514.GA2834@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-29 06:14:42 +02:00
Pekka J Enberg	854c879f5a	x86: Move init_gbpages() to setup_arch() The init_gbpages() function is conditionally called from init_memory_mapping() function. There are two call-sites where this 'after_bootmem' condition can be true: setup_arch() and mem_init() via pci_iommu_alloc(). Therefore, it's safe to move the call to init_gbpages() to setup_arch() as it's always called before mem_init(). This removes an after_bootmem use - paving the way to remove all uses of that state variable. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-23 10:33:32 +02:00
Tejun Heo	e59a1bb2fd	x86: fix pageattr handling for lpage percpu allocator and re-enable it lpage allocator aliases a PMD page for each cpu and returns whatever is unused to the page allocator. When the pageattr of the recycled pages are changed, this makes the two aliases point to the overlapping regions with different attributes which isn't allowed and known to cause subtle data corruption in certain cases. This can be handled in simliar manner to the x86_64 highmap alias. pageattr code should detect if the target pages have PMD alias and split the PMD alias and synchronize the attributes. pcpur allocator is updated to keep the allocated PMD pages map sorted in ascending address order and provide pcpu_lpage_remapped() function which binary searches the array to determine whether the given address is aliased and if so to which address. pageattr is updated to use pcpu_lpage_remapped() to detect the PMD alias and split it up as necessary from cpa_process_alias(). Jan Beulich spotted the original problem and incorrect usage of vaddr instead of laddr for lookup. With this, lpage percpu allocator should work correctly. Re-enable it. [ Impact: fix subtle lpage pageattr bug and re-enable lpage ] Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Tejun Heo	992f4c1c2c	x86: reorganize cpa_process_alias() Reorganize cpa_process_alias() so that new alias condition can be added easily. Jan Beulich spotted problem in the original cleanup thread which incorrectly assumed the two existing conditions were mutially exclusive. [ Impact: code reorganization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu>	2009-06-22 11:56:24 +09:00
Linus Torvalds	d06063cc22	Move FAULT_FLAG_xyz into handle_mm_fault() callers This allows the callers to now pass down the full set of FAULT_FLAG_xyz flags to handle_mm_fault(). All callers have been (mechanically) converted to the new calling convention, there's almost certainly room for architectures to clean up their code and then add FAULT_FLAG_RETRY when that support is added. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-21 13:08:22 -07:00
Linus Torvalds	12e24f34cb	Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits) perfcounter: Handle some IO return values perf_counter: Push perf_sample_data through the swcounter code perf_counter tools: Define and use our own u64, s64 etc. definitions perf_counter: Close race in perf_lock_task_context() perf_counter, x86: Improve interactions with fast-gup perf_counter: Simplify and fix task migration counting perf_counter tools: Add a data file header perf_counter: Update userspace callchain sampling uses perf_counter: Make callchain samples extensible perf report: Filter to parent set by default perf_counter tools: Handle lost events perf_counter: Add event overlow handling fs: Provide empty .set_page_dirty() aop for anon inodes perf_counter: tools: Makefile tweaks for 64-bit powerpc perf_counter: powerpc: Add processor back-end for MPC7450 family perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels perf_counter: powerpc: Change how processor-specific back-ends get selected perf_counter: powerpc: Use unsigned long for register and constraint values perf_counter: powerpc: Enable use of software counters on 32-bit powerpc perf_counter tools: Add and use isprint() ...	2009-06-20 11:29:32 -07:00
Linus Torvalds	c4c5ab3089	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits) x86, mce: fix error path in mce_create_device() x86: use zalloc_cpumask_var for mce_dev_initialized x86: fix duplicated sysfs attribute x86: de-assembler-ize asm/desc.h i386: fix/simplify espfix stack switching, move it into assembly i386: fix return to 16-bit stack from NMI handler x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present x86: Remove duplicated #include's x86: msr.h linux/types.h is only required for __KERNEL__ x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround x86, mce: mce_intel.c needs <asm/apic.h> x86: apic/io_apic.c: dmar_msi_type should be static x86, io_apic.c: Work around compiler warning x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present x86: mce: Handle banks == 0 case in K7 quirk x86, boot: use .code16gcc instead of .code16 x86: correct the conversion of EFI memory types x86: cap iomem_resource to addressable physical memory x86, mce: rename _64.c files which are no longer 64-bit-specific x86, mce: mce.h cleanup ... Manually fix up trivial conflict in arch/x86/mm/fault.c	2009-06-20 10:49:48 -07:00
Linus Torvalds	7f81890687	x86: don't use 'access_ok()' as a range check in get_user_pages_fast() It's really not right to use 'access_ok()', since that is meant for the normal "get_user()" and "copy_from/to_user()" accesses, which are done through the TLB, rather than through the page tables. Why? access_ok() does both too few, and too many checks. Too many, because it is meant for regular kernel accesses that will not honor the 'user' bit in the page tables, and because it honors the USER_DS vs KERNEL_DS distinction that we shouldn't care about in GUP. And too few, because it doesn't do the 'canonical' check on the address on x86-64, since the TLB will do that for us. So instead of using a function that isn't meant for this, and does something else and much more complicated, just do the real rules: we don't want the range to overflow, and on x86-64, we want it to be a canonical low address (on 32-bit, all addresses are canonical). Acked-by: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-06-20 09:52:27 -07:00
Ingo Molnar	0c87197142	perf_counter, x86: Improve interactions with fast-gup Improve a few details in perfcounter call-chain recording that makes use of fast-GUP: - Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally racy and can be changed on another CPU, so we have to be careful about how we access them. The PAE branch is already careful with read-barriers - but the non-PAE and 64-bit side needs an ACCESS_ONCE() to make sure the pte value is observed only once. - make the checks a bit stricter so that we can feed it any kind of cra^H^H^H user-space input ;-) Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-19 16:55:16 +02:00
Ingo Molnar	a3d06cc6aa	Merge branch 'linus' into perfcounters/core Conflicts: arch/x86/include/asm/kmap_types.h include/linux/mm.h include/asm-generic/kmap_types.h Merge reason: We crossed changes with kmap_types.h cleanups in mainline. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 13:06:17 +02:00
Ingo Molnar	eadb8a091b	Merge branch 'linus' into tracing/hw-breakpoints Conflicts: arch/x86/Kconfig arch/x86/kernel/traps.c arch/x86/power/cpu.c arch/x86/power/cpu_32.c kernel/Makefile Semantic conflict: arch/x86/kernel/hw_breakpoint.c Merge reason: Resolve the conflicts, move from put_cpu_no_sched() to put_cpu() in arch/x86/kernel/hw_breakpoint.c. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 12:56:49 +02:00
Ingo Molnar	cc4949e1fd	Merge branch 'linus' into x86/urgent Merge reason: pull in latest to fix a bug in it. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-17 08:59:10 +02:00
Ingo Molnar	5dfaf90f80	x86: mm: Read cr2 before prefetching the mmap_lock Prefetch instructions can generate spurious faults on certain models of older CPUs. The faults themselves cannot be stopped and they can occur pretty much anywhere - so the way we solve them is that we detect certain patterns and ignore the fault. There is one small path of code where we must not take faults though: the #PF handler execution leading up to the reading of the CR2 (the faulting address). If we take a fault there then we destroy the CR2 value (with that of the prefetching instruction's) and possibly mishandle user-space or kernel-space pagefaults. It turns out that in current upstream we do exactly that: prefetchw(&mm->mmap_sem); /* Get the faulting address: */ address = read_cr2(); This is not good. So turn around the order: first read the cr2 then prefetch the lock address. Reading cr2 is plenty fast (2 cycles) so delaying the prefetch by this amount shouldnt be a big issue performance-wise. [ And this might explain a mystery fault.c warning that sometimes occurs on one an old AMD/Semptron based test-system i have - which does have such prefetch problems. ] Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <npiggin@suse.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> LKML-Reference: <20090616030522.GA22162@Krystal> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-16 10:23:32 +02:00
Peter Zijlstra	465a454f25	x86, mm: Add __get_user_pages_fast() Introduce a gup_fast() variant which is usable from IRQ/NMI context. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Nick Piggin <npiggin@suse.de> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-15 15:57:51 +02:00
Vegard Nossum	722f2a6c87	Merge commit 'linus/master' into HEAD Conflicts: MAINTAINERS Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:50:49 +02:00
Vegard Nossum	ac61a75796	kmemcheck: add opcode self-testing at boot We've had some troubles in the past with weird instructions. This patch adds a self-test framework which can be used to verify that a certain set of opcodes are decoded correctly. Of course, the opcodes which are not tested can still give the wrong results. In short, this is just a safeguard to catch unintentional changes in the opcode decoder. It does not mean that errors can't still occur! [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:49:22 +02:00
Vegard Nossum	b1eeab6768	kmemcheck: add hooks for the page allocator This adds support for tracking the initializedness of memory that was allocated with the page allocator. Highmem requests are not tracked. Cc: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> [build fix for !CONFIG_KMEMCHECK] Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 15:48:33 +02:00
Vegard Nossum	9e730237c2	kmemcheck: don't track page tables As these are allocated using the page allocator, we need to pass __GFP_NOTRACK before we add page allocator support to kmemcheck. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:40:11 +02:00
Vegard Nossum	f85612967c	x86: add hooks for kmemcheck The hooks that we modify are: - Page fault handler (to handle kmemcheck faults) - Debug exception handler (to hide pages after single-stepping the instruction that caused the page fault) Also redefine memset() to use the optimized version if kmemcheck is enabled. (Thanks to Pekka Enberg for minimizing the impact on the page fault handler.) As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable the optimized xor code, and rely instead on the generic C implementation in order to avoid false-positive warnings. Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no> [whitespace fixlet] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>	2009-06-15 12:40:02 +02:00
Pekka Enberg	f8b4ece2a9	kmemcheck: use kmemcheck_pte_lookup() instead of open-coding it Lets use kmemcheck_pte_lookup() in kmemcheck_fault() instead of open-coding it there. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:40:00 +02:00
Pekka Enberg	6d9609c132	kmemcheck: move 64-bit ifdef out of kmemcheck_opcode_decode() This patch moves the CONFIG_X86_64 ifdef out of kmemcheck_opcode_decode() by introducing a version of the function that always returns false for CONFIG_X86_32. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:39:59 +02:00
Pekka Enberg	0c33cacd86	kmemcheck: remove multiple ifdef'd definitions of the same global variable Multiple ifdef'd definitions of the same global variable is ugly and error-prone. Fix that up. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:39:57 +02:00
Pekka Enberg	5b53b76a61	kmemcheck: make initialization message less confusing The "Bugs, beware!" printout during is cute but confuses users that something bad happened so change the text to the more boring "Initialized" message. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:39:56 +02:00
Pekka Enberg	6a19638719	kmemcheck: remove forward declarations from error.c This patch reorders code in error.c so that we can get rid of the forward declarations. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>	2009-06-15 12:39:47 +02:00
Randy Dunlap	60e383931d	kmemcheck: include module.h to prevent warnings kmemcheck/shadow.c needs to include <linux/module.h> to prevent the following warnings: linux-next-20080724/arch/x86/mm/kmemcheck/shadow.c:64: warning : data definition has no type or storage class linux-next-20080724/arch/x86/mm/kmemcheck/shadow.c:64: warning : type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL' linux-next-20080724/arch/x86/mm/kmemcheck/shadow.c:64: warning : parameter names (without types) in function declaration Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: vegardno@ifi.uio.no Cc: penberg@cs.helsinki.fi Cc: akpm <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-13 15:37:49 +02:00
Vegard Nossum	dfec072ecd	kmemcheck: add the kmemcheck core General description: kmemcheck is a patch to the linux kernel that detects use of uninitialized memory. It does this by trapping every read and write to memory that was allocated dynamically (e.g. using kmalloc()). If a memory address is read that has not previously been written to, a message is printed to the kernel log. Thanks to Andi Kleen for the set_memory_4k() solution. Andrew Morton suggested documenting the shadow member of struct page. Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [export kmemcheck_mark_initialized] [build fix for setup_max_cpus] Signed-off-by: Ingo Molnar <mingo@elte.hu> [rebased for mainline inclusion] Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>	2009-06-13 15:37:30 +02:00
Shaohua Li	41d840e224	x86: change kernel_physical_mapping_init() __init to __meminit kernel_physical_mapping_init() could be called in memory hotplug path. [ Impact: fix potential crash with memory hotplug ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <20090612045752.GA827@sli10-desk.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-12 14:39:21 +02:00
Yinghai Lu	55cd63676e	x86: make zap_low_mapping could be used early Only one cpu is there, just call __flush_tlb for it. Fixes the following boot warning on x86: [ 0.000000] Memory: 885032k/915540k available (5993k kernel code, 29844k reserved, 3842k data, 428k init, 0k highmem) [ 0.000000] virtual kernel memory layout: [ 0.000000] fixmap : 0xffe17000 - 0xfffff000 (1952 kB) [ 0.000000] vmalloc : 0xf8615000 - 0xffe15000 ( 120 MB) [ 0.000000] lowmem : 0xc0000000 - 0xf7e15000 ( 894 MB) [ 0.000000] .init : 0xc19a5000 - 0xc1a10000 ( 428 kB) [ 0.000000] .data : 0xc15da4bb - 0xc199af6c (3842 kB) [ 0.000000] .text : 0xc1000000 - 0xc15da4bb (5993 kB) [ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok. [ 0.000000] ------------[ cut here ]------------ [ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x50/0x1b0() [ 0.000000] Hardware name: System Product Name [ 0.000000] Modules linked in: [ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip #52504 [ 0.000000] Call Trace: [ 0.000000] [<c104aa16>] warn_slowpath_common+0x65/0x95 [ 0.000000] [<c104aa58>] warn_slowpath_null+0x12/0x15 [ 0.000000] [<c1073bbe>] smp_call_function_many+0x50/0x1b0 [ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41 [ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41 [ 0.000000] [<c1073d4f>] smp_call_function+0x31/0x58 [ 0.000000] [<c1037615>] ? do_flush_tlb_all+0x0/0x41 [ 0.000000] [<c104f635>] on_each_cpu+0x26/0x65 [ 0.000000] [<c10374b5>] flush_tlb_all+0x19/0x1b [ 0.000000] [<c1032ab3>] zap_low_mappings+0x4d/0x56 [ 0.000000] [<c15d64b5>] ? printk+0x14/0x17 [ 0.000000] [<c19b42a8>] mem_init+0x23d/0x245 [ 0.000000] [<c19a56a1>] start_kernel+0x17a/0x2d5 [ 0.000000] [<c19a5347>] ? unknown_bootoption+0x0/0x19a [ 0.000000] [<c19a5039>] __init_begin+0x39/0x41 [ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]--- Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>	2009-06-12 13:50:24 +03:00
Linus Torvalds	8a1ca8cedd	Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (574 commits) perf_counter: Turn off by default perf_counter: Add counter->id to the throttle event perf_counter: Better align code perf_counter: Rename L2 to LL cache perf_counter: Standardize event names perf_counter: Rename enums perf_counter tools: Clean up u64 usage perf_counter: Rename perf_counter_limit sysctl perf_counter: More paranoia settings perf_counter: powerpc: Implement generalized cache events for POWER processors perf_counters: powerpc: Add support for POWER7 processors perf_counter: Accurate period data perf_counter: Introduce struct for sample data perf_counter tools: Normalize data using per sample period data perf_counter: Annotate exit ctx recursion perf_counter tools: Propagate signals properly perf_counter tools: Small frequency related fixes perf_counter: More aggressive frequency adjustment perf_counter/x86: Fix the model number of Intel Core2 processors perf_counter, x86: Correct some event and umask values for Intel processors ...	2009-06-11 14:01:07 -07:00
Ingo Molnar	940010c5a3	Merge branch 'linus' into perfcounters/core Conflicts: arch/x86/kernel/irqinit.c arch/x86/kernel/irqinit_64.c arch/x86/kernel/traps.c arch/x86/mm/fault.c include/linux/sched.h kernel/exit.c	2009-06-11 17:55:42 +02:00
Peter Zijlstra	f4dbfa8f31	perf_counter: Standardize event names Pure renames only, to PERF_COUNT_HW_* and PERF_COUNT_SW_*. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-11 17:54:15 +02:00
Thomas Gleixner	9866b7e86a	x86: memtest: use pointers of equal type for comparison Commit `c9690998ef` (x86: memtest: remove 64-bit division) introduced following compile warning: arch/x86/mm/memtest.c: In function 'memtest': arch/x86/mm/memtest.c:56: warning: comparison of distinct pointer types lacks a cast arch/x86/mm/memtest.c:58: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2009-06-11 16:26:35 +02:00
Linus Torvalds	8623661180	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits) Revert "x86, bts: reenable ptrace branch trace support" tracing: do not translate event helper macros in print format ftrace/documentation: fix typo in function grapher name tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK tracing: add protection around module events unload tracing: add trace_seq_vprint interface tracing: fix the block trace points print size tracing/events: convert block trace points to TRACE_EVENT() ring-buffer: fix ret in rb_add_time_stamp ring-buffer: pass in lockdep class key for reader_lock tracing: add annotation to what type of stack trace is recorded tracing: fix multiple use of __print_flags and __print_symbolic tracing/events: fix output format of user stack tracing/events: fix output format of kernel stack tracing/trace_stack: fix the number of entries in the header ring-buffer: discard timestamps that are at the start of the buffer ring-buffer: try to discard unneeded timestamps ring-buffer: fix bug in ring_buffer_discard_commit ftrace: do not profile functions when disabled tracing: make trace pipe recognize latency format flag ...	2009-06-10 19:53:40 -07:00
Linus Torvalds	be15f9d63b	Merge branch 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-xen-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (42 commits) xen: cache cr0 value to avoid trap'n'emulate for read_cr0 xen/x86-64: clean up warnings about IST-using traps xen/x86-64: fix breakpoints and hardware watchpoints xen: reserve Xen start_info rather than e820 reserving xen: add FIX_TEXT_POKE to fixmap lguest: update lazy mmu changes to match lguest's use of kvm hypercalls xen: honour VCPU availability on boot xen: add "capabilities" file xen: drop kexec bits from /sys/hypervisor since kexec isn't implemented yet xen/sys/hypervisor: change writable_pt to features xen: add /sys/hypervisor support xen/xenbus: export xenbus_dev_changed xen: use device model for suspending xenbus devices xen: remove suspend_cancel hook xen/dev-evtchn: clean up locking in evtchn xen: export ioctl headers to userspace xen: add /dev/xen/evtchn driver xen: add irq_from_evtchn xen: clean up gate trap/interrupt constants xen: set _PAGE_NX in __supported_pte_mask before pagetable construction ...	2009-06-10 16:16:27 -07:00
Linus Torvalds	9b29e8228a	Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Clear TS in irq_ts_save() when in an atomic section x86: Detect use of extended APIC ID for AMD CPUs x86: memtest: remove 64-bit division x86, UV: Fix macros for multiple coherency domains x86: Fix non-lazy GS handling in sys_vm86() x86: Add quirk for reboot stalls on a Dell Optiplex 360 x86: Fix UV BAU activation descriptor init	2009-06-10 16:15:14 -07:00
Linus Torvalds	bb7762961d	Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) x86: fix system without memory on node0 x86, mm: Fix node_possible_map logic mm, x86: remove MEMORY_HOTPLUG_RESERVE related code x86: make sparse mem work in non-NUMA mode x86: process.c, remove useless headers x86: merge process.c a bit x86: use sparse_memory_present_with_active_regions() on UMA x86: unify 64-bit UMA and NUMA paging_init() x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB x86: Sanity check the e820 against the SRAT table using e820 map only x86: clean up and and print out initial max_pfn_mapped x86/pci: remove rounding quirk from e820_setup_gap() x86, e820, pci: reserve extra free space near end of RAM x86: fix typo in address space documentation x86: 46 bit physical address support on 64 bits x86, mm: fault.c, use printk_once() in is_errata93() x86: move per-cpu mmu_gathers to mm/init.c x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c x86: unify noexec handling x86: remove (null) in /sys kernel_page_tables ...	2009-06-10 16:13:20 -07:00
Linus Torvalds	7dc3ca39cb	Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, nmi: Use predefined numbers instead of hardcoded one x86: asm/processor.h: remove double declaration x86, mtrr: replace MTRRdefType_MSR with msr-index's MSR_MTRRdefType x86, mtrr: replace MTRRfix4K_C0000_MSR with msr-index's MSR_MTRRfix4K_C0000 x86, mtrr: remove mtrr MSRs double declaration x86, mtrr: replace MTRRfix16K_80000_MSR with msr-index's MSR_MTRRfix16K_80000 x86, mtrr: replace MTRRfix64K_00000_MSR with msr-index's MSR_MTRRfix64K_00000 x86, mtrr: replace MTRRcap_MSR with msr-index's MSR_MTRRcap x86: mce: remove duplicated #include x86: msr-index.h remove duplicate MSR C001_0015 declaration x86: clean up arch/x86/kernel/tsc_sync.c a bit x86: use symbolic name for VM86_SIGNAL when used as vm86 default return x86: added 'ifndef _ASM_X86_IOMAP_H' to iomap.h x86: avoid multiple declaration of kstack_depth_to_print x86: vdso/vma.c declare vdso_enabled and arch_setup_additional_pages before they get used x86: clean up declarations and variables x86: apic/x2apic_cluster.c x86_cpu_to_logical_apicid should be static x86 early quirks: eliminate unused function	2009-06-10 15:49:36 -07:00
Andreas Herrmann	c9690998ef	x86: memtest: remove 64-bit division Using gcc 3.3.5 a "make allmodconfig" + "CONFIG_KVM=n" triggers a build error: arch/x86/mm/built-in.o(.init.text+0x43f7): In function `__change_page_attr': arch/x86/mm/pageattr.c:114: undefined reference to `__udivdi3' make: *** [.tmp_vmlinux1] Error 1 The culprit turned out to be a division in arch/x86/mm/memtest.c For more info see this thread: http://marc.info/?l=linux-kernel&m=124416232620683 The patch entirely removes the division that caused the build error. [ Impact: build fix with certain GCC versions ] Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: xiyou.wangcong@gmail.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <20090608170939.GB12431@alberich.amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-08 19:18:25 +02:00
K.Prasad	62edab9056	hw-breakpoints: reset bits in dr6 after the corresponding exception is handled This patch resets the bit in dr6 after the corresponding exception is handled in code, so that we keep a clean track of the current virtual debug status register. [ Impact: keep track of breakpoints triggering completion ] Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>	2009-06-02 22:47:00 +02:00
Ingo Molnar	23db9f430b	Merge branch 'linus' into perfcounters/core Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6 based - to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-06-01 10:01:39 +02:00
Mel Gorman	32b154c0b0	x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302 On x86 and x86-64, it is possible that page tables are shared beween shared mappings backed by hugetlbfs. As part of this, page_table_shareable() checks a pair of vma->vm_flags and they must match if they are to be shared. All VMA flags are taken into account, including VM_LOCKED. The problem is that VM_LOCKED is cleared on fork(). When a process with a shared memory segment forks() to exec() a helper, there will be shared VMAs with different flags. The impact is that the shared segment is sometimes considered shareable and other times not, depending on what process is checking. What happens is that the segment page tables are being shared but the count is inaccurate depending on the ordering of events. As the page tables are freed with put_page(), bad pmd's are found when some of the children exit. The hugepage counters also get corrupted and the Total and Free count will no longer match even when all the hugepage-backed regions are freed. This requires a reboot of the machine to "fix". This patch addresses the problem by comparing all flags except VM_LOCKED when deciding if pagetables should be shared or not for hugetlbfs-backed mapping. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ingo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <starlight@binnacle.cx> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-05-29 08:40:03 -07:00
Pallipadi, Venkatesh	2171787be2	x86: avoid back to back on_each_cpu in cpa_flush_array Cleanup cpa_flush_array() to avoid back to back on_each_cpu() calls. [ Impact: optimizes fix `0af48f42df` ] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-26 13:12:12 -07:00
venkatesh.pallipadi@intel.com	0af48f42df	x86: cpa_flush_array wbinvd should be done on all CPUs cpa_flush_array seems to prefer wbinvd() over clflush at 4M threshold. clflush needs to be done on only one CPU as per instruction definition. wbinvd() however, should be done on all CPUs. [ Impact: fix missing flush which could cause data corruption ] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-22 13:33:59 -07:00
venkatesh.pallipadi@intel.com	0b827537e3	x86: bugfix wbinvd() model check instead of family check wbinvd is supported on all CPUs 486 or later. But, pageattr.c is checking x86_model >= 4 before wbinvd(), which looks like an oversight bug. It was first introduced at one place by changeset `d7c8f21a8c` and got copied over to second place in the same file later. [ Impact: fix missing cache flush on early-model CPUs, potential data corruption ] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-22 13:33:27 -07:00
Ingo Molnar	1079cac0f4	Merge commit 'v2.6.30-rc6' into tracing/core Merge reason: we were on an -rc4 base, sync up to -rc6 Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 10:15:35 +02:00
Yinghai Lu	7c43769a97	x86, mm: Fix node_possible_map logic Recently there were some changes to the meaning of node_possible_map, and it is quite strange: - the node without memory would be set in node_possible_map - but some node with less NODE_MIN_SIZE will be kicked out of node_possible_map. fix it by adding strict_setup_node_bootmem(). Also, remove unparse_node(). so result will be: 1. cpu_to_node() will return online node only (nearest one) 2. apicid_to_node() still returns the node that could be not online but is set in node_possible_map. 3. node_possible_map will include nodes that mem on it are less NODE_MIN_SIZE v2: after move_cpus_to_node change. [ Impact: get node_possible_map right ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <4A0C49BE.6080800@kernel.org> [ v3: various small cleanups and comment clarifications ] Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 09:21:04 +02:00
Yinghai Lu	888a589f6b	mm, x86: remove MEMORY_HOTPLUG_RESERVE related code after: \| commit `b263295dbf` \| Author: Christoph Lameter <clameter@sgi.com> \| Date: Wed Jan 30 13:30:47 2008 +0100 \| \| x86: 64-bit, make sparsemem vmemmap the only memory model we don't have MEMORY_HOTPLUG_RESERVE anymore. Historically, x86-64 had an architecture-specific method for memory hotplug whereby it scanned the SRAT for physical memory ranges that could be potentially used for memory hot-add later. By reserving those ranges without physical memory, the memmap would be allocated and left dormant until needed. This depended on the DISCONTIG memory model which has been removed so the code implementing HOTPLUG_RESERVE is now dead. This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE. (Changelog authored by Mel.) v2: updated changelog, and remove hotadd= in doc [ Impact: remove dead code ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Mel Gorman <mel@csn.ul.ie> Workflow-found-OK-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A0C4910.7090508@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 09:13:31 +02:00
Ingo Molnar	dc3f81b129	Merge commit 'v2.6.30-rc6' into perfcounters/core Merge reason: this branch was on an -rc4 base, merge it up to -rc6 to get the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-18 07:37:49 +02:00
Shaohua Li	ed077b58f6	x86: make sparse mem work in non-NUMA mode With sparse memory, holes should not be marked present for memmap. This patch makes sure sparsemem really works on SMP mode (!NUMA). [ Impact: use less memory to map fragmented RAM, avoid boot-OOM/crash ] Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Sheng Yang <sheng.yang@intel.com> LKML-Reference: <1242117600.22431.0.camel@sli10-desk.sh.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-12 11:26:35 +02:00
Pekka Enberg	087fa4e964	x86: use sparse_memory_present_with_active_regions() on UMA There's no need to use call memory_present() manually on UMA because initmem_init() sets up early_node_map by calling e820_register_active_regions(). [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1241699742.17846.31.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:52:06 +02:00
Pekka Enberg	3551f88f64	x86: unify 64-bit UMA and NUMA paging_init() 64-bit UMA and NUMA versions of paging_init() are almost identical. Therefore, merge the copy in mm/numa_64.c to mm/init_64.c to remove duplicate code. [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1241699741.17846.30.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:52:06 +02:00
Yinghai Lu	0964b0562b	x86: Allow 1MB of slack between the e820 map and SRAT, not 4GB It is expected that there might be slight differences between the e820 map and the SRAT table and the intention was that 1MB of slack be allowed. The calculation comparing e820ram and pxmram assumes the units are bytes, when they are in fact pages. This means 4GB of slack is being allowed, not 1MB. This patch makes the correct comparison. comment is from Mel. [ Impact: don't accept buggy SRATs that could dump up to 4G of RAM ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A03E13E.6050107@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:38:21 +02:00
Yinghai Lu	b37ab91907	x86: Sanity check the e820 against the SRAT table using e820 map only node_cover_memory() sanity checks the SRAT table by ensuring that all PXMs cover the memory reported in the e820. However, when calculating the size of the holes in the e820, it uses the early_node_map[] which contains information taken from both SRAT and e820. If the SRAT is missing an entry, then it is not detected that the SRAT table is incorrect and missing entries. This patch uses the e820 map to calculate the holes instead of early_node_map[]. comment is from Mel. [ Impact: reject incorrect SRAT tables ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4A03E10C.60906@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:35:07 +02:00
Yinghai Lu	80989ce064	x86: clean up and and print out initial max_pfn_mapped Do this so we can check the range that is mapped before init_memory_mapping(). To be able to print out meaningful info, we first have to fix 64-bit to have max_pfn_mapped assigned before that call. This also unifies the code-path a bit. [ Impact: print more debug info, cleanup ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <49BF0978.40605@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 11:11:12 +02:00
Ingo Molnar	134cbf35c7	Merge commit 'v2.6.30-rc5' into x86/mm Merge reason: this branch was on a .30-rc2 base - sync it up with all the latest fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-11 09:33:15 +02:00
Ingo Molnar	f066a15533	Merge branch 'x86/urgent' into x86/xen Conflicts: arch/frv/include/asm/pgtable.h arch/x86/include/asm/required-features.h arch/x86/xen/mmu.c Merge reason: x86/xen was on a .29 base still, move it to a fresher branch and pick up Xen fixes as well, plus resolve conflicts Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-08 10:50:00 +02:00
Jan Beulich	4983439676	x86-64: finish cleanup_highmaps()'s job wrt. _brk_end With the introduction of the .brk section, special care must be taken that no unused page table entries remain if _brk_end and _end are separated by a 2M page boundary. cleanup_highmap() runs very early and hence cannot take care of that, hence potential entries needing to be removed past _brk_end must be cleared once the brk allocator has done its job. [ Impact: avoids undesirable TLB aliases ] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-05-07 21:51:34 -07:00
Ingo Molnar	44347d947f	Merge branch 'linus' into tracing/core Merge reason: tracing/core was on a .30-rc1 base and was missing out on on a handful of tracing fixes present in .30-rc5-almost. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-07 11:17:34 +02:00
Nikanth Karthikesan	e0e5ea3268	x86: Fix a typo in a printk message [ Impact: printk message cleanup ] Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> LKML-Reference: <200905040908.27299.knikanth@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-06 12:23:12 +02:00
David Rientjes	7eccf7b227	x86, srat: do not register nodes beyond e820 map The mem= option will truncate the memory map at a specified address so it's not possible to register nodes with memory beyond the e820 upper bound. unparse_node() is only called when then node had memory associated with it, although with the mem= option it is no longer addressable. [ Impact: fix boot hang on certain (large) systems ] Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <alpine.DEB.2.00.0905051248150.20021@chino.kir.corp.google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-06 10:49:07 +02:00
Linus Torvalds	e91b3b2681	Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: x86, mmiotrace: fix range test tracing: fix ref count in splice pages	2009-05-05 12:08:02 -07:00
Ingo Molnar	a454ab3110	x86, mm: fault.c, use printk_once() in is_errata93() Andrew pointed out that the 'once' variable has a needlessly function-global scope. We can in fact eliminate it completely, via the use of printk_once(). [ Impact: cleanup ] Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-05-03 10:09:03 +02:00
Pekka Enberg	9518e0e435	x86: move per-cpu mmu_gathers to mm/init.c [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1240923650.1982.22.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-30 10:12:37 +02:00
Pekka Enberg	2b72394e40	x86: move max_pfn_mapped and max_low_pfn_mapped to setup.c This patch moves the max_pfn_mapped and max_low_pfn_mapped global variables to kernel/setup.c where they're initialized. [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1240923649.1982.21.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-30 10:12:36 +02:00
Ingo Molnar	e7fd5d4b3d	Merge branch 'linus' into perfcounters/core Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up the latest upstream fixes. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 14:47:05 +02:00
Stuart Bennett	0f9a623dd6	tracing: x86, mmiotrace: only register for die notifier when tracer active Follow up to `afcfe024ae` in Linus' tree ("x86: mmiotrace: quieten spurious warning message") Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1240946271-7083-5-git-send-email-stuart@freedesktop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 11:33:34 +02:00
Stuart Bennett	46e91d00b1	tracing: x86, mmiotrace: refactor clearing/restore of page presence * change function names to clear_* from set_: in reality we only clear and restore page presence, and never unconditionally set present. Using clear_({true, false}, ...) is therefore more honest than set_({false, true}, ...) upgrade presence storage to pteval_t: doing user-space tracing will require saving and manipulation of the _PAGE_PROTNONE bit, in addition to the existing _PAGE_PRESENT changes, and having multiple bools stored and passed around does not seem optimal [ Impact: refactor, clean up mmiotrace code ] Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1240946271-7083-4-git-send-email-stuart@freedesktop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 11:33:33 +02:00
Stuart Bennett	0492e1bb8f	tracing: x86, mmiotrace: code consistency/legibility improvement kmmio_probe being p and kmmio_fault_page being sometimes f and sometimes *p is not helpful. [ Impact: cleanup ] Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1240946271-7083-3-git-send-email-stuart@freedesktop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 11:33:33 +02:00
Stuart Bennett	33015c8599	tracing: x86, mmiotrace: fix range test Matching on (addr == (p->addr + p->len)) causes problems when mappings are adjacent. [ Impact: fix mmiotrace confusion on adjacent iomaps ] Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Acked-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1240946271-7083-2-git-send-email-stuart@freedesktop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-29 11:32:22 +02:00
Yinghai Lu	4c31e92b97	x86: check boundary in setup_node_bootmem() Commit `dc09855` ("x86/uv: fix init of memory-less nodes") causes a two sockets system (where node-1 doesn't have RAM installed) to crash. That commit makes node_possible include cpu nodes that do not have memory. So check boundary in setup_node_bootmem(). [ Impact: fix boot crash on RAM-less NUMA node system ] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Jack Steiner <steiner@sgi.com> LKML-Reference: <49EF89DF.9090404@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-23 09:58:56 +02:00
Pekka Enberg	89388913f2	x86: unify noexec handling This patch unifies noexec handling on 32-bit and 64-bit. [ Impact: cleanup ] Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> [ mingo@elte.hu: build fix ] LKML-Reference: <1240303167.771.69.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-21 10:48:08 +02:00
Ingo Molnar	8ecee4620e	Merge branch 'linus' into x86/mm Merge reason: refresh the topic: there's been 290 non-merges commits upstream to arch/x86 alone. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-21 10:46:54 +02:00
Ingo Molnar	62d1702909	Merge branch 'linus' into x86/urgent Merge reason: We need the x86/uv updates from upstream, to queue up dependent fix. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-20 18:08:12 +02:00
Jaswinder Singh Rajput	a81b6314e0	x86: mm/numa_32.c calculate_numa_remap_pages should use __init calculate_numa_remap_pages() is called only by __init initmem_init() further calculate_numa_remap_pages is calling: __init find_e820_area() and __init reserve_early() So calculate_numa_remap_pages() should be __init calculate_numa_remap_pages(). WARNING: arch/x86/built-in.o(.text+0x82ea3): Section mismatch in reference from the function calculate_numa_remap_pages() to the function .init.text:find_e820_area() The function calculate_numa_remap_pages() references the function __init find_e820_area(). This is often because calculate_numa_remap_pages lacks a __init annotation or the annotation of find_e820_area is wrong. WARNING: arch/x86/built-in.o(.text+0x82f5f): Section mismatch in reference from the function calculate_numa_remap_pages() to the function .init.text:reserve_early() The function calculate_numa_remap_pages() references the function __init reserve_early(). This is often because calculate_numa_remap_pages lacks a __init annotation or the annotation of reserve_early is wrong. [ Impact: save memory, address Section mismatch warning ] Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <1239991281.3153.4.camel@ht.satnam> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-17 22:43:13 +02:00
Jack Steiner	dc09855191	x86/uv: fix init of memory-less nodes Add support for nodes that have cpus but no memory. The current code was failing to add these nodes to the nodes_present_map. v2: Fixes case caught by David Rientjes - missed support for the x2apic SRAT table. [ Impact: fix potential boot crash on memory-less UV nodes. ] Reported-by: David Rientjes <rientjes@google.com> Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20090417142242.GA23743@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-17 22:42:12 +02:00
Linus Torvalds	b9836e0837	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: fix microcode driver newly spewing warnings x86, PAT: Remove page granularity tracking for vm_insert_pfn maps x86: disable X86_PTRACE_BTS for now x86, documentation: kernel-parameters replace X86-32,X86-64 with X86 x86: pci-swiotlb.c swiotlb_dma_ops should be static x86, PAT: Remove duplicate memtype reserve in devmem mmap x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() x86, PAT: Changing memtype to WC ensuring no WB alias x86, PAT: Handle faults cleanly in set_memory_ APIs x86, PAT: Change order of cpa and free in set_memory_wb x86, CPA: Change idmap attribute before ioremap attribute setup	2009-04-17 09:56:11 -07:00
Pallipadi, Venkatesh	4b06504627	x86, PAT: Remove page granularity tracking for vm_insert_pfn maps This change resolves the problem of too many single page entries in pat_memtype_list and "freeing invalid memtype" errors with i915, reported here: http://marc.info/?l=linux-kernel&m=123845244713183&w=2 Remove page level granularity track and untrack of vm_insert_pfn. memtype tracking at page granularity does not scale and cleaner approach would be for the driver to request a type for a bigger IO address range or PCI io memory range for that device, either at mmap time or driver init time and just use that type during vm_insert_pfn. This patch just removes the track/untrack of vm_insert_pfn. That means we will be in same state as 2.6.28, with respect to these APIs. Newer APIs for the drivers to request a memtype for a bigger region is coming soon. [ Impact: fix Xorg startup warnings and hangs ] Reported-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Tested-by: Arkadiusz Miskiewicz <a.miskiewicz@gmail.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> LKML-Reference: <20090408223716.GC3493@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-17 00:44:22 +02:00
Yinghai Lu	6424fb3866	x86: remove (null) in /sys kernel_page_tables Impact: cleanup %p prints out 0x000000000000000 as (null) so use %lx instead. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <49E43282.1090607@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-14 11:50:22 +02:00
Jaswinder Singh Rajput	2c1b284e4f	x86: clean up declarations and variables Impact: cleanup, no code changed - syscalls.h update declarations due to unifications - irq.c declare smp_generic_interrupt() before it gets used - process.c declare sys_fork() and sys_vfork() before they get used - tsc.c rename tsc_khz shadowed variable - apic/probe_32.c declare apic_default before it gets used - apic/nmi.c prev_nmi_count should be unsigned - apic/io_apic.c declare smp_irq_move_cleanup_interrupt() before it gets used - mm/init.c declare direct_gbpages and free_initrd_mem before they get used Signed-off-by: Jaswinder Singh Rajput <jaswinder@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-12 15:20:16 +02:00
Marcin Slusarz	1ee4bd92a7	x86: fix wrong section of pat_disable & make it static pat_disable cannot be __cpuinit anymore because it's called from pat_init and the callchain looks like this: pat_disable [cpuinit] <- pat_init <- generic_set_all <- ipi_handler <- set_mtrr <- (other non init/cpuinit functions) WARNING: arch/x86/mm/built-in.o(.text+0x449e): Section mismatch in reference from the function pat_init() to the function .cpuinit.text:pat_disable() The function pat_init() references the function __cpuinit pat_disable(). This is often because pat_init lacks a __cpuinit annotation or the annotation of pat_disable is wrong. Non CONFIG_X86_PAT version of pat_disable is static inline, so this version can be static too (and there are no callers outside of this file). Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Acked-by: Sam Ravnborg <sam@ravnborg.org> LKML-Reference: <49DFB055.6070405@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-12 12:34:23 +02:00
Masami Hiramatsu	9b987aeb4a	x86: fix set_fixmap to use phys_addr_t Impact: fix kprobes crash on 32-bit with RAM above 4G Use phys_addr_t for receiving a physical address argument instead of unsigned long. This allows fixmap to handle pages higher than 4GB on x86-32. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: systemtap-ml <systemtap@sources.redhat.com> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <49DE3695.6040800@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 20:27:13 +02:00
Suresh Siddha	0c3c8a1836	x86, PAT: Remove duplicate memtype reserve in devmem mmap /dev/mem mmap code was doing memtype reserve/free for a while now. Recently we added memtype tracking in remap_pfn_range, and /dev/mem mmap uses it indirectly. So, we don't need seperate tracking in /dev/mem code any more. That means another ~100 lines of code removed :-). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090409212709.085210000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 13:55:48 +02:00
Suresh Siddha	b6ff32d9aa	x86, PAT: Consolidate code in pat_x_mtrr_type() and reserve_memtype() Fix pat_x_mtrr_type() to use UC_MINUS when the mtrr type return UC. This is to be consistent with ioremap() and ioremap_nocache() which uses UC_MINUS. Consolidate the code such that reserve_memtype() also uses pat_x_mtrr_type() when the caller doesn't specify any special attribute (non WB attribute). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090409212708.939936000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 13:55:48 +02:00
venkatesh.pallipadi@intel.com	3869c4aa18	x86, PAT: Changing memtype to WC ensuring no WB alias As per SDM, there should not be any aliasing of a WC with any cacheable type across CPUs. That is if one CPU is changing the identity map memtype to _WC, no other CPU at the time of this change should not have a TLB for this page that carries a WB attribute. SDM suggests to make the page not present. But for that we will have to handle any page faults that can potentially happen due to these pages being not present. Other way to deal with this without having any WB mapping is to change the page first to UC and then to WC. This ensures that we meet the SDM requirement of no cacheable alais to WC page. This also has same or lower overhead than marking the page not present and making it present later. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090409212708.797481000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 13:55:47 +02:00
venkatesh.pallipadi@intel.com	9fa3ab390a	x86, PAT: Handle faults cleanly in set_memory_ APIs Handle faults and do proper cleanups in set_memory_*() functions. In some cases, these functions were not doing proper free on failure paths. With the changes to tracking memtype of RAM pages in struct page instead of pat list, we do not need the changes in commits c5e147. This patch reverts that change. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090409212708.653222000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 13:55:47 +02:00
venkatesh.pallipadi@intel.com	a5593e0b32	x86, PAT: Change order of cpa and free in set_memory_wb To be free of aliasing due to races, set_memory_* interfaces should follow ordering of reserving, changing memtype to UC/WC, changing memtype back to WB followed by free. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20090409212708.512280000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 13:55:46 +02:00
Suresh Siddha	43a432b155	x86, CPA: Change idmap attribute before ioremap attribute setup Change the identity mapping with the requested attribute first, before we setup the virtual memory mapping with the new requested attribute. This makes sure that there is no window when identity map'ed attribute may disagree with ioremap range on the attribute type. This also avoids doing cpa on the ioremap'ed address twice (first in ioremap_page_range and then in ioremap_change_attr using vaddr), and should improve ioremap performance a bit. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> LKML-Reference: <20090409212708.373330000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 13:55:46 +02:00
Andy Grover	a0d22f485a	x86: Document get_user_pages_fast() While better than get_user_pages(), the usage of gupf(), especially the return values and the fact that it can potentially only partially pin the range, warranted some documentation. Signed-off-by: Andy Grover <andy.grover@oracle.com> Cc: npiggin@suse.de Cc: akpm@linux-foundation.org LKML-Reference: <1239320729-3262-1-git-send-email-andy.grover@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-10 13:14:08 +02:00
Peter Zijlstra	78f13e9525	perf_counter: allow for data addresses to be recorded Paul suggested we allow for data addresses to be recorded along with the traditional IPs as power can provide these. For now, only the software pagefault events provide data addresses, but in the future power might as well for some events. x86 doesn't seem capable of providing this atm. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> LKML-Reference: <20090408130409.394816925@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-08 19:05:56 +02:00
Jeremy Fitzhardinge	38f4b8c0da	Merge commit 'origin/master' into for-linus/xen/master * commit 'origin/master': (4825 commits) Fix build errors due to CONFIG_BRANCH_TRACER=y parport: Use the PCI IRQ if offered tty: jsm cleanups Adjust path to gpio headers KGDB_SERIAL_CONSOLE check for module Change KCONFIG name tty: Blackin CTS/RTS Change hardware flow control from poll to interrupt driven Add support for the MAX3100 SPI UART. lanana: assign a device name and numbering for MAX3100 serqt: initial clean up pass for tty side tty: Use the generic RS485 ioctl on CRIS tty: Correct inline types for tty_driver_kref_get() splice: fix deadlock in splicing to file nilfs2: support nanosecond timestamp nilfs2: introduce secondary super block nilfs2: simplify handling of active state of segments nilfs2: mark minor flag for checkpoint created by internal operation nilfs2: clean up sketch file nilfs2: super block operations fix endian bug ... Conflicts: arch/x86/include/asm/thread_info.h arch/x86/lguest/boot.c drivers/xen/manage.c	2009-04-07 13:34:16 -07:00
Peter Zijlstra	ac17dc8e58	perf_counter: provide major/minor page fault software events Provide separate sw counters for major and minor page faults. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:40 +02:00
Peter Zijlstra	7dd1fcc258	perf_counter: provide pagefault software events We use the generic software counter infrastructure to provide page fault events. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-06 09:29:37 +02:00
Linus Torvalds	32fb6c1756	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (140 commits) ACPI: processor: use .notify method instead of installing handler directly ACPI: button: use .notify method instead of installing handler directly ACPI: support acpi_device_ops .notify methods toshiba-acpi: remove MAINTAINERS entry ACPI: battery: asynchronous init acer-wmi: Update copyright notice & documentation acer-wmi: Cleanup the failure cleanup handling acer-wmi: Blacklist Acer Aspire One video: build fix thinkpad-acpi: rework brightness support thinkpad-acpi: enhanced debugging messages for the fan subdriver thinkpad-acpi: enhanced debugging messages for the hotkey subdriver thinkpad-acpi: enhanced debugging messages for rfkill subdrivers thinkpad-acpi: restrict access to some firmware LEDs thinkpad-acpi: remove HKEY disable functionality thinkpad-acpi: add new debug helpers and warn of deprecated atts thinkpad-acpi: add missing log levels thinkpad-acpi: cleanup debug helpers thinkpad-acpi: documentation cleanup thinkpad-acpi: drop ibm-acpi alias ...	2009-04-05 11:16:25 -07:00
Linus Torvalds	714f83d5d9	Merge branch 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (413 commits) tracing, net: fix net tree and tracing tree merge interaction tracing, powerpc: fix powerpc tree and tracing tree interaction ring-buffer: do not remove reader page from list on ring buffer free function-graph: allow unregistering twice trace: make argument 'mem' of trace_seq_putmem() const tracing: add missing 'extern' keywords to trace_output.h tracing: provide trace_seq_reserve() blktrace: print out BLK_TN_MESSAGE properly blktrace: extract duplidate code blktrace: fix memory leak when freeing struct blk_io_trace blktrace: fix blk_probes_ref chaos blktrace: make classic output more classic blktrace: fix off-by-one bug blktrace: fix the original blktrace blktrace: fix a race when creating blk_tree_root in debugfs blktrace: fix timestamp in binary output tracing, Text Edit Lock: cleanup tracing: filter fix for TRACE_EVENT_FORMAT events ftrace: Using FTRACE_WARN_ON() to check "freed record" in ftrace_release() x86: kretprobe-booster interrupt emulation code fix ... Fix up trivial conflicts in arch/parisc/include/asm/ftrace.h include/linux/memory.h kernel/extable.c kernel/module.c	2009-04-05 11:04:19 -07:00
Linus Torvalds	90975ef712	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits) cpumask: remove cpumask allocation from idle_balance, fix numa, cpumask: move numa_node_id default implementation to topology.h, fix cpumask: remove cpumask allocation from idle_balance x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus x86: cpumask: update 32-bit APM not to mug current->cpus_allowed x86: microcode: cleanup x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash numa, cpumask: move numa_node_id default implementation to topology.h cpumask: convert node_to_cpumask_map[] to cpumask_var_t cpumask: remove x86 cpumask_t uses. cpumask: use cpumask_var_t in uv_flush_tlb_others. cpumask: remove cpumask_t assignment from vector_allocation_domain() cpumask: make Xen use the new operators. cpumask: clean up summit's send_IPI functions cpumask: use new cpumask functions throughout x86 x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t cpumask: convert node_to_cpumask_map[] to cpumask_var_t x86: unify 32 and 64-bit node_to_cpumask_map ...	2009-04-05 10:33:07 -07:00
Len Brown	478c6a43fc	Merge branch 'linus' into release Conflicts: arch/x86/kernel/cpu/cpufreq/longhaul.c Signed-off-by: Len Brown <len.brown@intel.com>	2009-04-05 02:14:15 -04:00
Suresh Siddha	7237d3de78	x86, ACPI: add support for x2apic ACPI extensions All logical processors with APIC ID values of 255 and greater will have their APIC reported through Processor X2APIC structure (type-9 entry type) and all logical processors with APIC ID less than 255 will have their APIC reported through legacy Processor Local APIC (type-0 entry type) only. This is the same case even for NMI structure reporting. The Processor X2APIC Affinity structure provides the association between the X2APIC ID of a logical processor and the proximity domain to which the logical processor belongs. For OSPM, Procssor IDs outside the 0-254 range are to be declared as Device() objects in the ACPI namespace. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>	2009-04-03 20:08:12 -04:00
Andrew Morton	6a491e2e3e	x86: fix is_io_mapping_possible() build warning on i386 allnoconfig i386 allnoconfig: arch/x86/mm/iomap_32.c: In function 'is_io_mapping_possible': arch/x86/mm/iomap_32.c:27: warning: comparison is always false due to limited range of data type Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-03 18:39:05 +02:00
Akinobu Mita	a7f8c50d90	x86, mm: fix misuse of debug_kmap_atomic Impact: fix CONFIG_DEBUG_HIGHMEM=y breakage Commit `7ca43e756` ("mm: use debug_kmap_atomic") introduced some debug_kmap_atomic() calls in the wrong places. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <20090402070126.GA3951@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-04-02 16:37:04 +02:00
Ingo Molnar	8302294f43	Merge branch 'tracing/core-v2' into tracing-for-linus Conflicts: include/linux/slub_def.h lib/Kconfig.debug mm/slob.c mm/slub.c	2009-04-02 00:49:02 +02:00
Akinobu Mita	7ca43e7564	mm: use debug_kmap_atomic Use debug_kmap_atomic in kmap_atomic, kmap_atomic_pfn, and iomap_atomic_prot_pfn. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Akinobu Mita	f4112de6b6	mm: introduce debug_kmap_atomic x86 has debug_kmap_atomic_prot() which is error checking function for kmap_atomic. It is usefull for the other architectures, although it needs CONFIG_TRACE_IRQFLAGS_SUPPORT. This patch exposes it to the other architectures. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-04-01 08:59:14 -07:00
Ingo Molnar	65fb0d23fc	Merge branch 'linus' into cpumask-for-linus Conflicts: arch/x86/kernel/cpu/common.c	2009-03-30 23:53:32 +02:00
Ingo Molnar	a2bcd4731f	x86/mm: further cleanups of fault.c's include file section Impact: cleanup Eliminate more than 20 unnecessary #include lines in fault.c Also fix include file dependency bug in asm/traps.h. (this was masked before, by implicit inclusion) Signed-off-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <tip-56aea8468746e673a4bf50b6a13d97b2d1cbe1e8@git.kernel.org> Acked-by: H. Peter Anvin <hpa@linux.intel.com>	2009-03-30 14:02:02 +02:00
Jeremy Fitzhardinge	b8bcfe997e	x86/paravirt: remove lazy mode in interrupts Impact: simplification, robustness Make paravirt_lazy_mode() always return PARAVIRT_LAZY_NONE when in an interrupt. This prevents interrupt code from accidentally inheriting an outer lazy state, and instead does everything synchronously. Outer batched operations are left deferred. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de>	2009-03-29 23:35:38 -07:00
Ingo Molnar	6e15cf0486	Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2 Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-27 17:28:43 +01:00
Wang Chen	9f4f25c86f	x86: early_ioremap_init(), use __fix_to_virt(), because we are sure it's safe Tetsuo Handa reported this link bug: \| arch/x86/mm/built-in.o(.init.text+0x1831): In function `early_ioremap_init': \| : undefined reference to `__this_fixmap_does_not_exist' \| make: *** [.tmp_vmlinux1] Error 1 Commit:8827247ffcc9e880cbe4705655065cf011265157 used a variable (which would be optimized to constant) as fix_to_virt()'s parameter. It's depended on gcc's optimization and fails on old gcc. (Tetsuo used gcc 3.3) We can use __fix_to_vir() instead, because we know it's safe and don't need link time error reporting. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Cc: sfr@canb.auug.org.au LKML-Reference: <49C9FFEA.7060908@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-25 14:07:11 +01:00
Jeremy Fitzhardinge	45c7b28f3c	Revert "x86: create a non-zero sized bm_pte only when needed" This reverts commit `698609bdcd`. 69860 breaks Xen booting, as it relies on head.S to set up the fixmap pagetables (as a side-effect of initializing the USB debug port). Xen, however, does not boot via head.S, and so the fixmap area is not initialized. The specific symptom of the crash is a fault in dmi_scan(), because the pointer that early_ioremap returns is not actually present. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jan Beulich <jbeulich@novell.com> LKML-Reference: <49C43A8E.5090203@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-21 17:11:41 +01:00
venkatesh.pallipadi@intel.com	0f3507555f	x86, CPA: Add set_pages_arrayuc and set_pages_array_wb Add new interfaces: set_pages_array_uc() set_pages_array_wb() that can be used change the page attribute for a bunch of pages with flush etc done once at the end of all the changes. These interfaces are similar to existing set_memory_array_uc() and set_memory_array_wc(). Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: arjan@infradead.org Cc: eric@anholt.net Cc: airlied@redhat.com LKML-Reference: <20090319215358.901545000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-20 10:34:49 +01:00
venkatesh.pallipadi@intel.com	9ae2847591	x86, PAT: Add support for struct page pointer array in cpa set_clr Add struct page array pointer to cpa struct and CPA_PAGES_ARRAY. With that we can add change_page_attr_set_clr() a parameter to pass struct page array pointer and that can be handled by the underlying cpa code. cpa_flush_array() is also changed to support both addr array or struct page pointer array, depending on the flag. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: arjan@infradead.org Cc: eric@anholt.net Cc: airlied@redhat.com LKML-Reference: <20090319215358.758513000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-20 10:34:48 +01:00
venkatesh.pallipadi@intel.com	728c951887	x86, CPA: Add a flag parameter to cpa set_clr() Change change_page_attr_set_clr() array parameter to a flag. This helps following patches which adds an interface to change attr to uc/wb over a set of pages referred by struct page. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: arjan@infradead.org Cc: eric@anholt.net Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: airlied@redhat.com LKML-Reference: <20090319215358.611346000@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-20 10:34:47 +01:00
Jeremy Fitzhardinge	b40c757964	x86/32: no need to use set_pte_present in set_pte_vaddr Impact: cleanup, remove last user of set_pte_present set_pte_vaddr() is only used to install ptes in fixmaps, and should never be used to overwrite a present mapping. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Xen-devel <xen-devel@lists.xensource.com> LKML-Reference: <1237406613-2929-1-git-send-email-jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-19 14:04:18 +01:00
Rusty Russell	c38da5692e	x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus Impact: cleanup, reduce memory usage for CONFIG_CPUMASK_OFFSTACK=y Part of the "getting rid of obsolete cpumask_t" patch: 1) Use cpumask_var_t: this is a pointer if CONFIG_CPUMASK_OFFSTACK=y 2) Call alloc_cpumask_var() on first entry into enter_uniprocessor() 3) Use modern cpumask_* functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Pekka Paalanen <pq@iki.fi> LKML-Reference: <200903111633.55952.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-18 13:51:51 +01:00
Ingo Molnar	705bb9dc72	Merge branches 'x86/cleanups', 'x86/cpu', 'x86/debug', 'x86/mce2', 'x86/mm', 'x86/mtrr', 'x86/setup', 'x86/setup-memory', 'x86/urgent', 'x86/uv', 'x86/x2apic' and 'linus' into x86/core Conflicts: arch/parisc/kernel/irq.c	2009-03-18 13:19:49 +01:00
Suresh Siddha	ce4e240c27	x86: add x2apic_wrmsr_fence() to x2apic flush tlb paths Impact: optimize APIC IPI related barriers Uncached MMIO accesses for xapic are inherently serializing and hence we don't need explicit barriers for xapic IPI paths. x2apic MSR writes/reads don't have serializing semantics and hence need a serializing instruction or mfence, to make all the previous memory stores globally visisble before the x2apic msr write for IPI. Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "steiner@sgi.com" <steiner@sgi.com> Cc: Nick Piggin <npiggin@suse.de> LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-18 09:36:14 +01:00
Akinobu Mita	0920dce7d5	x86, mm: remove unnecessary include file from iomap_32.c asm/highmem.h inclusion is added to use kmap_atomic_prot_pfn() by commit `bb6d59ca92` Now kmap_atomic_prot_pfn is moved to iomap_32.c by commit `dd63fdcc63` So the asm/highmem.h inclusion in iomap_32.c is unnecessary now. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090315151517.GA29074@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-15 20:05:08 +01:00
Jeremy Fitzhardinge	93dbda7cbc	x86: add brk allocation for very, very early allocations Impact: new interface Add a brk()-like allocator which effectively extends the bss in order to allow very early code to do dynamic allocations. This is better than using statically allocated arrays for data in subsystems which may never get used. The space for brk allocations is in the bss ELF segment, so that the space is mapped properly by the code which maps the kernel, and so that bootloaders keep the space free rather than putting a ramdisk or something into it. The bss itself, delimited by __bss_stop, ends before the brk area (__brk_base to __brk_limit). The kernel text, data and bss is reserved up to __bss_stop. Any brk-allocated data is reserved separately just before the kernel pagetable is built, as that code allocates from unreserved spaces in the e820 map, potentially allocating from any unused brk memory. Ultimately any unused memory in the brk area is used in the general kernel memory pool. Initially the brk space is set to 1MB, which is probably much larger than any user needs (the largest current user is i386 head_32.S's code to build the pagetables to map the kernel, which can get fairly large with a big kernel image and no PSE support). So long as the system has sufficient memory for the bootloader to reserve the kernel+1MB brk, there are no bad effects resulting from an over-large brk. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-03-14 15:37:14 -07:00
Ingo Molnar	0ca0f16fd1	Merge branches 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/debug', 'x86/kconfig', 'x86/mm', 'x86/ptrace', 'x86/setup' and 'x86/urgent'; commit 'v2.6.29-rc8' into x86/core	2009-03-14 16:25:40 +01:00
Ingo Molnar	62395efdb0	Merge branch 'x86/asm' into tracing/syscalls We need the wider TIF work-mask checks in entry_32.S. Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-14 09:44:08 +01:00
Rusty Russell	0b966252d9	cpumask: convert node_to_cpumask_map[] to cpumask_var_t Impact: fix (CONFIG_MAXSMP=y only) boot crash `c032ef60d1` "cpumask: convert node_to_cpumask_map[] to cpumask_var_t" didn't get this one conversion. There was a compile warning, but I missed it. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Mike Travis <travis@sgi.com> LKML-Reference: <200903132342.42813.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 14:35:31 +01:00
Ingo Molnar	238a5b4bff	Merge branch 'cpus4096' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-x86 into cpus4096	2009-03-13 05:54:55 +01:00
Ingo Molnar	17d85bc756	Merge commit 'v2.6.29-rc8' into cpus4096	2009-03-13 05:54:43 +01:00
Rusty Russell	73e907de7d	cpumask: remove x86 cpumask_t uses. Impact: cleanup We are removing cpumask_t in favour of struct cpumask: mainly as a marker of what code is now CONFIG_CPUMASK_OFFSTACK-safe. The only non-trivial change here is vector_allocation_domain(): explicitly clear the mask and set the first word, rather than using assignment. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:57 +10:30
Rusty Russell	c032ef60d1	cpumask: convert node_to_cpumask_map[] to cpumask_var_t Impact: reduce kernel memory usage when CONFIG_CPUMASK_OFFSTACK=y Straightforward conversion: done for 32 and 64 bit kernels. node_to_cpumask_map is now a cpumask_var_t array. 64-bit used to be a dynamic cpumask_t array, and 32-bit used to be a static cpumask_t array. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:53 +10:30
Rusty Russell	71ee73e722	x86: unify 32 and 64-bit node_to_cpumask_map Impact: cleanup We take the 64-bit code and use it on 32-bit as well. The new file is called mm/numa.c. In a minor cleanup, we use cpu_none_mask instead of declaring a local cpu_mask_none. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:52 +10:30
Rusty Russell	b9c4398ed4	cpumask: remove x86's node_to_cpumask now everyone uses cpumask_of_node Impact: cleanup Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>	2009-03-13 14:49:52 +10:30
Pallipadi, Venkatesh	4bb9c5c021	VM, x86, PAT: Change is_linear_pfn_mapping to not use vm_pgoff Impact: fix false positive PAT warnings - also fix VirtalBox hang Use of vma->vm_pgoff to identify the pfnmaps that are fully mapped at mmap time is broken. vm_pgoff is set by generic mmap code even for cases where drivers are setting up the mappings at the fault time. The problem was originally reported here: http://marc.info/?l=linux-kernel&m=123383810628583&w=2 Change is_linear_pfn_mapping logic to overload VM_INSERTPAGE flag along with VM_PFNMAP to mean full PFNMAP setup at mmap time. Problem also tracked at: http://bugzilla.kernel.org/show_bug.cgi?id=12800 Reported-by: Thomas Hellstrom <thellstrom@vmware.com> Tested-by: Frans Pop <elendil@planet.nl> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha>@intel.com> Cc: Nick Piggin <npiggin@suse.de> Cc: "ebiederm@xmission.com" <ebiederm@xmission.com> Cc: <stable@kernel.org> # only for 2.6.29.1, not .28 LKML-Reference: <20090313004527.GA7176@linux-os.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 04:28:50 +01:00
Jan Beulich	698609bdcd	x86: create a non-zero sized bm_pte only when needed Impact: kernel image size reduction Since in most configurations the pmd page needed maps the same range of virtual addresses which is also mapped by the earlier inserted one for covering FIX_DBGP_BASE, that page (and its insertion in the page tables) can be avoided altogether by detecting the condition at compile time. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <49B91826.76E4.0078.0@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 02:37:20 +01:00
Jan Beulich	dc9dd5cc85	x86: move save_mr() into .meminit.text Impact: cleanup, save memory The function is only being called from boot or memory hotplug paths. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <49B910B6.76E4.0078.0@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 02:37:18 +01:00
Jan Beulich	13c6c53282	x86, 32-bit: also use cpuinfo_x86's x86_{phys,virt}_bits members Impact: 32/64-bit consolidation In a first step, this allows fixing phys_addr_valid() for PAE (which until now reported all addresses to be valid). Subsequently, this will also allow simplifying some MTRR handling code. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <49B9101E.76E4.0078.0@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 02:37:17 +01:00
Ingo Molnar	dd63fdcc63	x86: unify kmap_atomic_pfn() and iomap_atomic_prot_pfn(), fix Impact: build fix Move kmap_atomic_prot_pfn() to iomap_32.c. It is used on all 32-bit kernels, while highmem_32.c is only built on highmem kernels. ( Note: the debug_kmap_atomic_prot() check is removed for now, that problem is handled via another patch. ) Reported-by: Thomas Gleixner <tglx@linutronix.de> Cc: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090311143317.GA22244@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-13 03:24:53 +01:00
Ingo Molnar	480c93df5b	Merge branch 'core/locking' into tracing/ftrace	2009-03-13 01:33:21 +01:00
Ingo Molnar	a98fe7f342	Merge branches 'x86/asm', 'x86/debug', 'x86/mm', 'x86/setup', 'x86/urgent' and 'linus' into x86/core	2009-03-12 11:50:15 +01:00
Stuart Bennett	afcfe024ae	x86: mmiotrace: quieten spurious warning message This message was being incorrectly emitted when using gdb, so compile it out by default for now; there will be a better fix in v2.6.30. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Acked-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-11 21:41:58 +01:00
Ingo Molnar	211b3d03c7	x86: work around Fedora-11 x86-32 kernel failures on Intel Atom CPUs Impact: work around boot crash Work around Intel Atom erratum AAH41 (probabilistically) - it's triggering in the field. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Tested-by: Kyle McMartin <kyle@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-11 18:22:03 +01:00
Akinobu Mita	12074fa107	x86: debug check for kmap_atomic_pfn and iomap_atomic_prot_pfn() It may be useful for kmap_atomic_pfn() and iomap_atomic_prot_pfn() to check invalid kmap usage as well as kmap_atomic. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090311143449.GB22244@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-11 15:47:46 +01:00
Akinobu Mita	bb6d59ca92	x86: unify kmap_atomic_pfn() and iomap_atomic_prot_pfn() kmap_atomic_pfn() and iomap_atomic_prot_pfn() are almost same except pgprot. This patch removes the code duplication for these two functions. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> LKML-Reference: <20090311143317.GA22244@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-11 15:47:46 +01:00
Ingo Molnar	8293dd6f86	Merge branch 'x86/core' into tracing/ftrace Semantic merge: kernel/trace/trace_functions_graph.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-10 10:17:48 +01:00
Ingo Molnar	467c88fee5	Merge branches 'x86/apic', 'x86/asm', 'x86/fixmap', 'x86/memtest', 'x86/mm', 'x86/urgent', 'linus' and 'core/percpu' into x86/core	2009-03-10 09:26:38 +01:00
Jeremy Fitzhardinge	0feca851c1	x86-32: make sure virt_addr_valid() returns false for fixmap addresses I found that virt_addr_valid() was returning true for fixmap addresses. I'm not sure whether pfn_valid() is supposed to include this test, but there's no harm in being explicit. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Jiri Slaby <jirislaby@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <49B166D6.2080505@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-08 20:03:52 +01:00
Stuart Bennett	d0fc63f7bd	x86 mmiotrace: fix remove_kmmio_fault_pages() Impact: fix race+crash in mmiotrace The list manipulation in remove_kmmio_fault_pages() was broken. If more than one consecutive kmmio_fault_page was re-added during the grace period between unregister_kmmio_probe() and remove_kmmio_fault_pages(), the list manipulation failed to remove pages from the release list. After a second grace period the pages get into rcu_free_kmmio_fault_pages() and raise a BUG_ON() kernel crash. The list manipulation is fixed to properly remove pages from the release list. This bug has been present from the very beginning of mmiotrace in the mainline kernel. It was introduced in `0fd0e3da` ("x86: mmiotrace full patch, preview 1"); An urgent fix for Linus. Tested by Stuart (on 32-bit) and Pekka (on amd and intel 64-bit systems, nouveau and nvidia proprietary). Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Signed-off-by: Pekka Paalanen <pq@iki.fi> LKML-Reference: <20090308202135.34933feb@daedalus.pq.iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-08 19:51:23 +01:00
Yinghai Lu	e954ef20c2	x86: fix warning about nodeid Impact: cleanup Ingo found there warning about nodeid with some configs. try to use for_each_online_node for non numa too. in that case nodeid will be 0. also move out boundary checking from setup_node_bootmem(), so non-numa config will not check it. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <49B03069.80001@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-08 19:34:17 +01:00
Wang Chen	8827247ffc	x86: don't define __this_fixmap_does_not_exist() Impact: improve out-of-range fixmap index debugging Commit "1b42f51630c7eebce6fb780b480731eb81afd325" defined the __this_fixmap_does_not_exist() function with a WARN_ON(1) in it. This causes the linker to not report an error when __this_fixmap_does_not_exist() is called with a non-constant parameter. Ingo defined __this_fixmap_does_not_exist() because he wanted to get virt addresses of fix memory of nest level by non-constant index. But we can fix this and still keep the link-time check: We can get the four slot virt addresses on link time and store them to array slot_virt[]. Then we can then refer the slot_virt with non-constant index, in the ioremap-leak detection code. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> LKML-Reference: <49B2075B.4070509@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-08 17:07:47 +01:00
Ingo Molnar	f0ef039851	Merge branch 'x86/core' into tracing/textedit Conflicts: arch/x86/Kconfig block/blktrace.c kernel/irq/handle.c Semantic conflict: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 16:45:01 +01:00
Pekka Enberg	5dd61dfabc	x86: rename do_not_nx to disable_nx in mm/init_64.c As a preparational step for unifying noexec handling on 32-bit and 64-bit, rename the do_not_nx variable to disable_nx on 64-bit. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236265497.31324.11.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 15:25:52 +01:00
Pekka Enberg	c77a3b59c6	x86: fix uninitialized variable in init_memory_mapping() Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236265466.31324.9.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 15:25:52 +01:00
Yinghai Lu	d1a8e77920	x86: make "memtest" like "memtest=17" Impact: make boot command line "memtest" do one loop by default So don't need to guess many patterns in one loop. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <49B10532.3020105@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-06 12:16:43 +01:00
Ingo Molnar	28e93a005b	Merge branch 'x86/mm' into x86/core	2009-03-05 21:49:35 +01:00
Ingo Molnar	a1413c89ae	Merge branch 'x86/urgent' into x86/core Conflicts: arch/x86/include/asm/fixmap_64.h Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 21:48:50 +01:00
Jeremy Fitzhardinge	ed26dbe5ae	x86: pre-initialize boot_cpu_data.x86_phys_bits to avoid system_state tests Impact: cleanup, micro-optimization Pre-initialize boot_cpu_data.x86_phys_bits to a reasonable default to remove the use of system_state tests in __virt_addr_valid() and __phys_addr(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:53:43 +01:00
Jeremy Fitzhardinge	dc16ecf7fd	x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid Rather than relying on the ever-unreliable system_state, add a specific __vmalloc_start_set flag to indicate whether the vmalloc area has meaningful boundaries yet, and use that in x86-32's __phys_addr and __virt_addr_valid. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:53:10 +01:00
Ingo Molnar	62436fe9ee	x86: move init_memory_mapping() to common mm/init.c, build fix on 32-bit PAE Impact: build fix Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-14-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:39:03 +01:00
Pekka Enberg	4fcb208391	x86: move function and variable declarations to asm/init.h Impact: cleanup Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-17-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:18 +01:00
Pekka Enberg	e53fb04fce	x86: unify kernel_physical_mapping_init() function signatures Impact: cleanup In preparation for moving the function declaration to a header file, unify 32-bit and 64-bit signatures. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-16-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:18 +01:00
Pekka Enberg	298af9d89f	x86: fix up some bad global variable names in mm/init.c Impact: cleanup The table_start, table_end, and table_top are too generic for global namespace so rename them to be more specific. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-15-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:17 +01:00
Pekka Enberg	f765090a26	x86: move init_memory_mapping() to common mm/init.c Impact: cleanup This patch moves the init_memory_mapping() function to common mm/init.c. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-14-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:17 +01:00
Pekka Enberg	0c0f756fd6	x86: add stub init_gbpages() for 32-bit init_memory_mapping() Impact: cleanup This patch adds an empty static inline init_gbpages() for the 32-bit version of init_memory_mapping() making both versions identical. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-13-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:16 +01:00
Pekka Enberg	b47e3418c5	x86: ifdef 32-bit and 64-bit NR_RANGE_MR for save_mr() unification Impact: cleanup As a trivial preparation for moving common code to arc/x86/mm/init.c, ifdef the 32-bit and 64-bit versions of NR_RANGE_MR. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-12-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:16 +01:00
Pekka Enberg	c338d6f60f	x86: ifdef 32-bit and 64-bit pfn setup in init_memory_mapping() Impact: cleanup To reduce the diff between the 32-bit and 64-bit versions of init_memory_mapping(), ifdef configuration specific pfn setup code in the function. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-11-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:15 +01:00
Pekka Enberg	01ced9ec14	x86: ifdef 32-bit and 64-bit setup in init_memory_mapping() Impact: cleanup To reduce the diff between the 32-bit and 64-bit versions of init_memory_mapping(), ifdef configuration specific setup code in the function. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-10-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:15 +01:00
Pekka Enberg	d58e854e36	x86: add table start and end sanity checks to 32-bit init_memory_mapping() Impact: cleanup This patch adds a sanity check to the 32-bit version of init_memory_mapping() to reduce the diff to the 64-bit version. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-9-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:14 +01:00
Pekka Enberg	cbba65796d	x86: unify kernel_physical_mapping_init() call in init_memory_mapping() Impact: cleanup The 64-bit version of init_memory_mapping() uses the last mapped address returned from kernel_physical_mapping_init() whereas the 32-bit version doesn't. This patch adds relevant ifdefs to both versions of the function to reduce the diff between them. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-8-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:14 +01:00
Pekka Enberg	c464573cb3	x86: rename after_init_bootmem to after_bootmem in mm/init_32.c Impact: cleanup This patch renames after_init_bootmem to after_bootmem in mm/init_32.c to reduce the diff to the 64-bit version of of init_memory_mapping(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-7-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:13 +01:00
Pekka Enberg	96083ca11b	x86: remove unnecessary save_mr() sanity check Impact: cleanup The save_mr() function already checks that start_pfn is less than end_pfn so we can remove the unnecessary check which reduces the diff between the 32-bit and the 64-bit versions of init_memory_mapping(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-6-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:13 +01:00
Pekka Enberg	54e63f3a42	x86: ifdef 32-bit specific setup in init_memory_mapping() Impact: cleanup Enabling NX, PSE, and PGE are only required on 32-bit so ifdef them in both versions of the function. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-5-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:12 +01:00
Pekka Enberg	e7179853e7	x86: move pgd_base out of init_memory_mapping() Impact: cleanup This patch moves pgd_base out of init_memory_mapping() to reduce the diff between the 32-bit version and the 64-bit version of the function. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-4-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:12 +01:00
Pekka Enberg	49a2bf7303	x86: find_early_table_space() unification Impact: cleanup There are some minor differences between the 32-bit and 64-bit find_early_table_space() functions. This patch wraps those differences under CONFIG_X86_32 to make the function identical on both configurations. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-3-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:11 +01:00
Pekka Enberg	4bbd4fa038	x86: add gbpages support to 32-bit init_memory_mapping() Impact: cleanup To reduce the diff between the 32-bit and 64-bit versions of init_memory_mapping(), add gbpages support to the 32-bit version. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-2-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:11 +01:00
Pekka Enberg	c3f5d2d8b5	x86: init_memory_mapping() trivial cleanups Impact: cleanup To reduce the diff between the 32-bit and 64-bit versions of init_memory_mapping(), fix up all trivial issues. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1236257708-27269-1-git-send-email-penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-05 14:17:10 +01:00
Yinghai Lu	fc5efe3941	x86: fix bootmem cross node for 32bit numa, cleanup Impact: clean up Simplify the code, reuse some lines. Remove min_low_pfn reference, it is always 0 Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <49AEE2C4.2030602@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 22:09:59 +01:00
Pekka Enberg	731ddea636	x86: move free_initrd_mem() to common mm/init.c Impact: cleanup The function is identical on 32-bit and 64-bit configurations so move it to the common mm/init.c file. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236158020.29024.28.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 20:59:26 +01:00
Yinghai Lu	b68adb16f2	x86: make 32-bit init_memory_mapping range change more like 64-bit Impact: cleanup make code more readable and more like 64-bit Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <49AE48B4.8010907@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 20:55:03 +01:00
Yinghai Lu	a71edd1f46	x86: fix bootmem cross node for 32bit numa Impact: fix panic on system 2g x4 sockets Found one system with 4 sockets and every sockets has 2g can not boot with numa32 because boot mem is crossing nodes. So try to have numa version of setup_bootmem_allocator(). Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <49AE485B.8000902@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 20:55:03 +01:00
Ingo Molnar	6d2e91bf80	Merge branch 'x86/urgent' into x86/mm Conflicts: arch/x86/include/asm/fixmap_64.h Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 20:20:10 +01:00
Pekka Enberg	6298e719cf	x86: set_highmem_pages_init() cleanup, #2 Impact: cleanup The zones are set up at this stage so there's a highmem zone available even for the UMA case. The only difference there is that for machines that have CONFIG_HIGHMEM enabled but don't have any highmem available, ->zone_start_pfn is zero whereas highstart_pfn is non-zero). The field is left zeroed because of the !size test in free_area_init_core() but shouldn't be a problem because add_highpages_with_active_regions() handles empty ranges just fine. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Mel Gorman <mel@csn.ul.ie> LKML-Reference: <1236154567.29024.23.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 19:00:51 +01:00
Pekka Enberg	540aca06b7	x86: move devmem_is_allowed() to common mm/init.c Impact: cleanup The function is identical on 32-bit and 64-bit configurations so move it to the common mm/init.c file. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236160001.29024.29.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 11:40:04 +01:00
Ingo Molnar	a1be621dfa	Merge branch 'tracing/ftrace'; commit 'v2.6.29-rc7' into tracing/core	2009-03-04 11:14:47 +01:00
Jeremy Fitzhardinge	f254f3909e	x86: un-__init fill_pud/pmd/pte They are used by __set_fixmap->set_pte_vaddr_pud, which can be used by arch_setup_additional_pages(), and so is used after init. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-04 02:29:36 +01:00
Ingo Molnar	91d75e209b	Merge branch 'x86/core' into core/percpu	2009-03-04 02:29:19 +01:00
Ingo Molnar	8b0e5860cb	Merge branches 'x86/apic', 'x86/cpu', 'x86/fixmap', 'x86/mm', 'x86/sched', 'x86/setup-lzma', 'x86/signal' and 'x86/urgent' into x86/core	2009-03-04 02:22:31 +01:00
Linus Torvalds	3024e4a997	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: oprofile: don't set counter width from cpuid on Core2 x86: fix init_memory_mapping() to handle small ranges	2009-03-03 14:32:55 -08:00
Linus Torvalds	f2a4165526	Merge branch 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing/mmiotrace' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86 mmiotrace: fix race with release_kmmio_fault_page() x86 mmiotrace: improve handling of secondary faults x86 mmiotrace: split set_page_presence() x86 mmiotrace: fix save/restore page table state x86 mmiotrace: WARN_ONCE if dis/arming a page fails x86: add far read test to testmmiotrace x86: count errors in testmmiotrace.ko	2009-03-03 14:32:37 -08:00
Ingo Molnar	03787ceed8	x86: set_highmem_pages_init() cleanup, fix !CONFIG_NUMA && CONFIG_HIGHMEM=y Impact: build fix arch/x86/mm/highmem_32.c:187: error: static declaration of 'set_highmem_pages_init' follows non-static declaration arch/x86/include/asm/numa_32.h:8: error: previous declaration of 'set_highmem_pages_init' was here Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236082212.2675.24.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 15:32:24 +01:00
Pekka Enberg	867c5b5292	x86: set_highmem_pages_init() cleanup Impact: cleanup This patch moves set_highmem_pages_init() to arch/x86/mm/highmem_32.c. The declaration of the function is kept in asm/numa_32.h because asm/highmem.h is included only if CONFIG_HIGHMEM is enabled so we can't put the empty static inline function there. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236082212.2675.24.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 13:13:15 +01:00
Pekka Enberg	e5b2bb5527	x86: unify free_init_pages() and free_initmem() Impact: unification This patch introduces a common arch/x86/mm/init.c and moves the identical free_init_pages() and free_initmem() functions to the file. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236078906.2675.18.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 12:21:18 +01:00
Pekka Enberg	e087edd8c0	x86: make sure initmem is writable on 64-bit Impact: unification This patch ports commit `3c1df68b84` ("x86: make sure initmem is writable") to the 64-bit version to unify implementations of free_init_pages(). Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Arjan van de Ven <arjan@linux.intel.com> LKML-Reference: <1236078904.2675.17.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 12:21:18 +01:00
Pekka Enberg	05f209e7b9	x86: add sanity checks to init_32.c Impact: unification This patch adds sanity checks that are already in init_64.c to init_32.c. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236078902.2675.16.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 12:21:17 +01:00
Pekka Enberg	fd578f9c0a	x86: use roundup() instead of PAGE_ALIGN() in find_early_table_space() Impact: cleanup This patch changes find_early_table_space() to use roundup() for rounding up tables to page size to unify the common parts of the 32-bit and 64-bit implementations. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236077705.2675.6.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 12:07:00 +01:00
Pekka Enberg	2b688dfd0a	x86: move __VMALLOC_RESERVE to pgtable_32.c Impact: cleanup The __VMALLOC_RESERVE global variable is not used in init_32.c. Move that to pgtable_32.c to reduce the diff between init_32.c and init_64.c. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> LKML-Reference: <1236077704.2675.4.camel@penberg-laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 12:06:59 +01:00
Yinghai Lu	0fc59d3a01	x86: fix init_memory_mapping() to handle small ranges Impact: fix failed EFI bootup in certain circumstances Ying Huang found init_memory_mapping() has problem with small ranges less than 2M when he tried to direct map the EFI runtime code out of max_low_pfn_mapped. It turns out we never considered that case and didn't check the range... Reported-by: Ying Huang <ying.huang@intel.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Brian Maly <bmaly@redhat.com> LKML-Reference: <49ACDDED.1060508@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-03 08:50:22 +01:00
Ingo Molnar	fdfa66ab45	Merge branches 'tracing/ftrace', 'tracing/mmiotrace' and 'linus' into tracing/core	2009-03-02 22:37:35 +01:00
Pekka Paalanen	340430c572	x86 mmiotrace: fix race with release_kmmio_fault_page() There was a theoretical possibility to a race between arming a page in post_kmmio_handler() and disarming the page in release_kmmio_fault_page(): cpu0 cpu1 ------------------------------------------------------------------ mmiotrace shutdown enter release_kmmio_fault_page fault on the page disarm the page disarm the page handle the MMIO access re-arm the page put the page on release list remove_kmmio_fault_pages() fault on the page page not known to mmiotrace fall back to do_page_fault() KABOOM (This scenario also shows the double disarm case which is allowed.) Fixed by acquiring kmmio_lock in post_kmmio_handler() and checking if the page is being released from mmiotrace. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Stuart Bennett <stuart@freedesktop.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 10:20:37 +01:00
Stuart Bennett	3e39aa156a	x86 mmiotrace: improve handling of secondary faults Upgrade some kmmio.c debug messages to warnings. Allow secondary faults on probed pages to fall through, and only log secondary faults that are not due to non-present pages. Patch edited by Pekka Paalanen. Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 10:20:37 +01:00
Pekka Paalanen	0b700a6a25	x86 mmiotrace: split set_page_presence() From 36772dcb6ffbbb68254cbfc379a103acd2fbfefc Mon Sep 17 00:00:00 2001 From: Pekka Paalanen <pq@iki.fi> Date: Sat, 28 Feb 2009 21:34:59 +0200 Split set_page_presence() in kmmio.c into two more functions set_pmd_presence() and set_pte_presence(). Purely code reorganization, no functional changes. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Stuart Bennett <stuart@freedesktop.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 10:20:36 +01:00
Pekka Paalanen	5359b585fb	x86 mmiotrace: fix save/restore page table state From baa99e2b32449ec7bf147c234adfa444caecac8a Mon Sep 17 00:00:00 2001 From: Pekka Paalanen <pq@iki.fi> Date: Sun, 22 Feb 2009 20:02:43 +0200 Blindly setting _PAGE_PRESENT in disarm_kmmio_fault_page() overlooks the possibility, that the page was not present when it was armed. Make arm_kmmio_fault_page() store the previous page presence in struct kmmio_fault_page and use it on disarm. This patch was originally written by Stuart Bennett, but Pekka Paalanen rewrote it a little different. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Stuart Bennett <stuart@freedesktop.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 10:20:36 +01:00
Stuart Bennett	e9d54cae8f	x86 mmiotrace: WARN_ONCE if dis/arming a page fails Print a full warning once, if arming or disarming a page fails. Also, if initial arming fails, do not handle the page further. This avoids the possibility of a page failing to arm and then later claiming to have handled any fault on that page. WARN_ONCE added by Pekka Paalanen. Signed-off-by: Stuart Bennett <stuart@freedesktop.org> Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 10:20:35 +01:00
Pekka Paalanen	5ff93697fc	x86: add far read test to testmmiotrace Apparently pages far into an ioremapped region might not actually be mapped during ioremap(). Add an optional read test to try to trigger a multiply faulting MMIO access. Also add more messages to the kernel log to help debugging. This patch is based on a patch suggested by Stuart Bennett <stuart@freedesktop.org> who discovered bugs in mmiotrace related to normal kernel space faults. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Stuart Bennett <stuart@freedesktop.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 10:20:35 +01:00
Pekka Paalanen	fab852aaf7	x86: count errors in testmmiotrace.ko Check the read values against the written values in the MMIO read/write test. This test shows if the given MMIO test area really works as memory, which is a prerequisite for a successful mmiotrace test. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Stuart Bennett <stuart@freedesktop.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-02 10:20:34 +01:00
Ingo Molnar	55f2b78995	Merge branch 'x86/urgent' into x86/pat	2009-03-01 12:47:58 +01:00
Ingo Molnar	f5c1aa1537	Revert "gpu/drm, x86, PAT: PAT support for io_mapping_*" This reverts commit `17581ad812`. Sitsofe Wheeler reported that /dev/dri/card0 is MIA on his EeePC 900 and bisected it to this commit. Graphics card is an i915 in an EeePC 900: 00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile 915GM/GMS/910GML Express Graphics Controller [8086:2592] (rev 04) ( Most likely the ioremap() of the driver failed and hence the card did not initialize. ) Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com> Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-03-01 12:47:49 +01:00
Ingo Molnar	92b9af9e4f	x86: i915 needs pgprot_writecombine() and is_io_mapping_possible() Impact: build fix Theodore Ts reported that the i915 driver needs these symbols: ERROR: "pgprot_writecombine" [drivers/gpu/drm/i915/i915.ko] undefined! ERROR: "is_io_mapping_possible" [drivers/gpu/drm/i915/i915.ko] undefined! Reported-by: Theodore Ts'o <tytso@mit.edu> wrote: Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-28 14:22:44 +01:00
Gustavo F. Padovan	fd862dde18	x86, fixmap: define reserve_top_address for x86_64 Impact: new interface (not yet use) Define reserve_top_address for x86_64; only for later x86 integration. Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br> Acked-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2009-02-27 20:57:47 -08:00
Ingo Molnar	ecc25fbd6b	Merge branches 'x86/apic', 'x86/defconfig', 'x86/memtest', 'x86/mm' and 'linus' into x86/core	2009-02-26 06:31:32 +01:00
Ingo Molnar	801c0be814	Merge branches 'x86/urgent' and 'x86/pat' into x86/core Conflicts: arch/x86/include/asm/pat.h	2009-02-26 06:31:23 +01:00
Ingo Molnar	13b2eda64d	Merge branch 'x86/urgent' into x86/core Conflicts: arch/x86/mach-voyager/voyager_smp.c	2009-02-26 06:30:42 +01:00
Ingo Molnar	13093cb0e5	gpu/drm, x86, PAT: PAT support for io_mapping_*, export symbols for modules Impact: build fix ERROR: "reserve_io_memtype_wc" [drivers/gpu/drm/i915/i915.ko] undefined! ERROR: "free_io_memtype" [drivers/gpu/drm/i915/i915.ko] undefined! Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-26 03:43:53 +01:00
Ingo Molnar	2e31add2a7	Merge branch 'x86/urgent' into x86/pat	2009-02-25 16:40:10 +01:00
Venkatesh Pallipadi	17581ad812	gpu/drm, x86, PAT: PAT support for io_mapping_* Make io_mapping_create_wc and io_mapping_free go through PAT to make sure that there are no memory type aliases. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Eric Anholt <eric@anholt.net> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 13:09:52 +01:00
Venkatesh Pallipadi	4ab0d47d0a	gpu/drm, x86, PAT: io_mapping_create_wc and resource_size_t io_mapping_create_wc should take a resource_size_t parameter in place of unsigned long. With unsigned long, there will be no way to map greater than 4GB address in i386/32 bit. On x86, greater than 4GB addresses cannot be mapped on i386 without PAE. Return error for such a case. Patch also adds a structure for io_mapping, that saves the base, size and type on HAVE_ATOMIC_IOMAP archs, that can be used to verify the offset on io_mapping_map calls. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Eric Anholt <eric@anholt.net> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 13:09:51 +01:00
Venkatesh Pallipadi	7880f74645	gpu/drm, x86, PAT: routine to keep identity map in sync Add a function to check and keep identity maps in sync, when changing any memory type. One of the follow on patches will also use this routine. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Eric Anholt <eric@anholt.net> Cc: Keith Packard <keithp@keithp.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 13:09:51 +01:00
Andreas Herrmann	63823126c2	x86: memtest: add additional (regular) test patterns Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 12:19:47 +01:00
Andreas Herrmann	bfb4dc0da4	x86: memtest: wipe out test pattern from memory Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 12:19:46 +01:00
Andreas Herrmann	570c9e69aa	x86: memtest: adapt log messages - print test pattern instead of pattern number, - show pattern as stored in memory, - use proper priority flags, - consistent use of u64 throughout the code Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 12:19:46 +01:00
Andreas Herrmann	7dad169e57	x86: memtest: cleanup memtest function Impact: code cleanup Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 12:19:45 +01:00
Andreas Herrmann	6d74171bf7	x86: memtest: introduce array to select memtest patterns Impact: code cleanup Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 12:19:45 +01:00
Andreas Herrmann	40823f737e	x86: memtest: reuse test patterns when memtest parameter exceeds number of available patterns Impact: fix unexpected behaviour when pattern number is out of range Current implementation provides 4 patterns for memtest. The code doesn't check whether the memtest parameter value exceeds the maximum pattern number. Instead the memtest code pretends to test with non-existing patterns, e.g. when booting with memtest=10 I've observed the following ... early_memtest: pattern num 10 0000001000 - 0000006000 pattern 0 ... 0000001000 - 0000006000 pattern 1 ... 0000001000 - 0000006000 pattern 2 ... 0000001000 - 0000006000 pattern 3 ... 0000001000 - 0000006000 pattern 4 ... 0000001000 - 0000006000 pattern 5 ... 0000001000 - 0000006000 pattern 6 ... 0000001000 - 0000006000 pattern 7 ... 0000001000 - 0000006000 pattern 8 ... 0000001000 - 0000006000 pattern 9 ... But in fact Linux didn't test anything for patterns > 4 as the default case in memtest() is to leave the function. I suggest to use the memtest parameter as the number of tests to be performed and to re-iterate over all existing patterns. Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-25 12:19:44 +01:00
Ingo Molnar	0edcf8d692	Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu Conflicts: arch/x86/include/asm/pgtable.h	2009-02-24 21:52:45 +01:00
Ingo Molnar	a852cbfaaf	Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core	2009-02-24 21:50:43 +01:00
Tejun Heo	458a3e644c	x86: update populate_extra_pte() and add populate_extra_pmd() Impact: minor change to populate_extra_pte() and addition of pmd flavor Update populate_extra_pte() to return pointer to the pte_t for the specified address and add populate_extra_pmd() which only populates till the pmd and returns pointer to the pmd entry for the address. For 64bit, pud/pmd/pte fill functions are separated out from set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and populate_extra_{pte\|pmd}(). Signed-off-by: Tejun Heo <tj@kernel.org>	2009-02-24 11:57:21 +09:00
Ingo Molnar	fc6fc7f1b1	Merge branch 'linus' into x86/apic Conflicts: arch/x86/mach-default/setup.c Semantic conflict resolution: arch/x86/kernel/setup.c Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-22 20:05:19 +01:00
Ingo Molnar	b319eed0aa	x86, mm: fault.c, simplify kmmio_fault(), cleanup Clarify the kmmio_fault() comment. Acked-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-22 10:24:18 +01:00
Hannes Eder	2366c298b5	x86: numa_32.c: fix sparse warning: Using plain integer as NULL pointer Fix this sparse warning: arch/x86/mm/numa_32.c:197:24: warning: Using plain integer as NULL pointer Signed-off-by: Hannes Eder <hannes@hanneseder.net> Cc: trivial@kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-22 09:27:12 +01:00
Ingo Molnar	f8eeb2e6be	x86, mm: fault.c, update copyrights Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-20 23:13:36 +01:00
Ingo Molnar	cd1b68f08f	x86, mm: fault.c, give another attempt at prefetch handing before SIGBUS Impact: extend prefetch handling on 64-bit Currently there's an extra is_prefetch() check done in do_sigbus(), which we only do on 32 bits. This is a last-ditch check before we terminate a task, so it's worth giving prefetch instructions another chance - should none of our existing quirks have caught a prefetch instruction related spurious fault. The only risk is if a prefetch causes a real sigbus, in that case we'll not OOM but try another fault. But this code has been on 32-bit for a long time, so it should be fine in practice. So do this on 64-bit too - and thus remove one more #ifdef. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:46 +01:00
Ingo Molnar	7c178a26d3	x86, mm: fault.c, remove #ifdef from fault_in_kernel_space() Impact: cleanup Removal of an #ifdef in fault_in_kernel_space(), by making use of the new TASK_SIZE_MAX symbol which is now available on 32-bit too. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:45 +01:00
Ingo Molnar	d951734654	x86, mm: rename TASK_SIZE64 => TASK_SIZE_MAX Impact: cleanup Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the define on 32-bit too. (mapped to TASK_SIZE) This allows 32-bit code to make use of the (former-) TASK_SIZE64 symbol as well, in a clean way. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:44 +01:00
Ingo Molnar	c3731c6866	x86, mm: fault.c, remove #ifdef from do_page_fault() Impact: cleanup do_page_fault() has this ugly #ifdef in its prototype: #ifdef CONFIG_X86_64 asmlinkage #endif void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code) Replace it with 'dotraplinkage' which maps to exactly the above construct: nothing on 32-bit and asmlinkage on 64-bit. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:44 +01:00
Ingo Molnar	1cc99544dd	x86, mm: fault.c, unify oops handling Impact: add oops-recursion check to 32-bit Unify the oops state-machine, to the 64-bit version. It is slightly more careful in that it does a recursion check in oops_begin(), and is thus more likely to show the relevant oops. It also means that 32-bit will print one more line at the end of pagefault triggered oopses: printk(KERN_EMERG "CR2: %016lx\n", address); Which is generally good information to be seen in partial-dump digital-camera jpegs ;-) The downside is the somewhat more complex critical path. Both variants have been tested well meanwhile by kernel developers crashing their boxes so i dont think this is a practical worry. This removes 3 ugly #ifdefs from no_context() and makes the function a lot nicer read. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:44 +01:00
Ingo Molnar	8f7661496c	x86, mm: fault.c, unify oops printing Impact: refine/extend page fault related oops printing on 64-bit - honor the pause_on_oops logic on 64-bit too - print out NX fault warnings on 64-bit as well - factor out the NX fault message to make it git-greppable and readable Note that this means that we do the PF_INSTR check on 32-bit non-PAE as well where it should not occur ... normally. Cannot hurt. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:43 +01:00
Ingo Molnar	f2f13a8535	x86, mm: fault.c, reorder functions Impact: cleanup Avoid a couple more #ifdefs by moving fundamentally non-unifiable functions into a single #ifdef 32-bit / #else / #endif block in fault.c: vmalloc*(), dump_pagetable(), check_vm8086_mode(). No code changed: text data bss dec hex filename 4618 32 24 4674 1242 fault.o.before 4618 32 24 4674 1242 fault.o.after Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:43 +01:00
Ingo Molnar	b18018126f	x86, mm, kprobes: fault.c, simplify notify_page_fault() Impact: cleanup Remove an #ifdef from notify_page_fault(). The function still compiles to nothing in the !CONFIG_KPROBES case. Introduce kprobes_built_in() and kprobe_fault_handler() helpers to allow this - they returns 0 if !CONFIG_KPROBES. No code changed: text data bss dec hex filename 4618 32 24 4674 1242 fault.o.before 4618 32 24 4674 1242 fault.o.after Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:42 +01:00
Ingo Molnar	b814d41f09	x86, mm: fault.c, simplify kmmio_fault() Impact: cleanup Remove an #ifdef from kmmio_fault() - we can do this by providing default implementations for is_kmmio_active() and kmmio_handler(). The compiler optimizes it all away in the !CONFIG_MMIOTRACE case. Also, while at it, clean up mmiotrace.h a bit: - standard header guards - standard vertical spaces for structure definitions No code changed (both with mmiotrace on and off in the config): text data bss dec hex filename 2947 12 12 2971 b9b fault.o.before 2947 12 12 2971 b9b fault.o.after Cc: Pekka Paalanen <pq@iki.fi> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:42 +01:00
Ingo Molnar	121d5d0a7e	x86, mm: fault.c, enable PF_RSVD checks on 32-bit too Impact: improve page fault handling robustness The 'PF_RSVD' flag (bit 3) of the page-fault error_code is a relatively recent addition to x86 CPUs, so the 32-bit do_fault() implementation never had it. This flag gets set when the CPU detects nonzero values in any reserved bits of the page directory entries. Extend the existing 64-bit check for PF_RSVD in do_page_fault() to 32-bit too. If we detect such a fault then we print a more informative oops and the pagetables. This unifies the code some more, removes an ugly #ifdef and improves the 32-bit page fault code robustness a bit. It slightly increases the 32-bit kernel text size. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:41 +01:00
Ingo Molnar	8c938f9fae	x86, mm: fault.c, factor out the vm86 fault check Impact: cleanup Instead of an ugly, open-coded, #ifdef-ed vm86 related legacy check in do_page_fault(), put it into the check_v8086_mode() helper function and merge it with an existing #ifdef. Also, simplify the code flow a tiny bit in the helper. No code changed: arch/x86/mm/fault.o: text data bss dec hex filename 2711 12 12 2735 aaf fault.o.before 2711 12 12 2735 aaf fault.o.after Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:41 +01:00
Ingo Molnar	107a03678c	x86, mm: fault.c, refactor/simplify the is_prefetch() code Impact: no functionality changed Factor out the opcode checker into a helper inline. The code got a tiny bit smaller: text data bss dec hex filename 4632 32 24 4688 1250 fault.o.before 4618 32 24 4674 1242 fault.o.after And it got cleaner / easier to review as well. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:40 +01:00
Ingo Molnar	2d4a71676f	x86, mm: fault.c cleanup Impact: cleanup, no code changed Clean up various small details, which can be correctness checked automatically: - tidy up the include file section - eliminate unnecessary includes - introduce show_signal_msg() to clean up code flow - standardize the code flow - standardize comments and other style details - more cleanups, pointed out by checkpatch No code changed on either 32-bit nor 64-bit: arch/x86/mm/fault.o: text data bss dec hex filename 4632 32 24 4688 1250 fault.o.before 4632 32 24 4688 1250 fault.o.after the md5 changed due to a change in a single instruction: 2e8a8241e7f0d69706776a5a26c90bc0 fault.o.before.asm c5c3d36e725586eb74f0e10692f0193e fault.o.after.asm Because a __LINE__ reference in a WARN_ONCE() has changed. On 32-bit a few stack offsets changed - no code size difference nor any functionality difference. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-21 00:09:39 +01:00
Steven Rostedt	1623963097	ftrace, x86: make kernel text writable only for conversions Impact: keep kernel text read only Because dynamic ftrace converts the calls to mcount into and out of nops at run time, we needed to always keep the kernel text writable. But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts the kernel code to writable before ftrace modifies the text, and converts it back to read only afterward. The kernel text is converted to read/write, stop_machine is called to modify the code, then the kernel text is converted back to read only. The original version used SYSTEM_STATE to determine when it was OK or not to change the code to rw or ro. Andrew Morton pointed out that using SYSTEM_STATE is a bad idea since there is no guarantee to what its state will actually be. Instead, I moved the check into the set_kernel_text_* functions themselves, and use a local variable to determine when it is OK to change the kernel text RW permissions. [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ] Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-20 14:30:06 -05:00
Ingo Molnar	c9e1585b1b	Merge branch 'tip/x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into x86/mm	2009-02-20 18:51:43 +01:00
Ingo Molnar	7a5714e018	x86, pat: add large-PAT check to split_large_page() Impact: future-proof the split_large_page() function Linus noticed that split_large_page() is not safe wrt. the PAT bit: it is bit 12 on the 1GB and 2MB page table level (_PAGE_BIT_PAT_LARGE), and it is bit 7 on the 4K page table level (_PAGE_BIT_PAT). Currently it is not a problem because we never set _PAGE_BIT_PAT_LARGE on any of the large-page mappings - but should this happen in the future the split_large_page() would silently lift bit 12 into the lowlevel 4K pte and would start corrupting the physical page frame offset. Not fun. So add a debug warning, to make sure if something ever sets the PAT bit then this function gets updated too. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-20 17:48:49 +01:00
Steven Rostedt	3c3e5694ad	x86: check PMD in spurious_fault handler Impact: fix to prevent hard lockup on bad PMD permissions If the PMD does not have the correct permissions for a page access, but the PTE does, the spurious fault handler will mistake the fault as a lazy TLB transaction. This will result in an infinite loop of: fault -> spurious_fault check (pass) -> return to code -> fault This patch adds a check and a warn on if the PTE passes the permissions but the PMD does not. [ Updated: Ingo Molnar suggested using WARN_ONCE with some text ] Signed-off-by: Steven Rostedt <srostedt@redhat.com>	2009-02-20 11:44:47 -05:00
Ingo Molnar	3b6f7b9beb	Merge branch 'x86/urgent' into x86/core	2009-02-20 17:40:43 +01:00
Ingo Molnar	07a66d7c53	x86: use the right protections for split-up pagetables Steven Rostedt found a bug in where in his modified kernel ftrace was unable to modify the kernel text, due to the PMD itself having been marked read-only as well in split_large_page(). The fix, suggested by Linus, is to not try to 'clone' the reference protection of a huge-page, but to use the standard (and permissive) page protection bits of KERNPG_TABLE. The 'cloning' makes sense for the ptes but it's a confused and incorrect concept at the page table level - because the pagetable entry is a set of all ptes and hence cannot 'clone' any single protection attribute - the ptes can be any mixture of protections. With the permissive KERNPG_TABLE, even if the pte protections get changed after this point (due to ftrace doing code-patching or other similar activities like kprobes), the resulting combined protections will still be correct and the pte's restrictive (or permissive) protections will control it. Also update the comment. This bug was there for a long time but has not caused visible problems before as it needs a rather large read-only area to trigger. Steve possibly hacked his kernel with some really large arrays or so. Anyway, the bug is definitely worth fixing. [ Huang Ying also experienced problems in this area when writing the EFI code, but the real bug in split_large_page() was not realized back then. ] Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Huang Ying <ying.huang@intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2009-02-20 08:35:03 +01:00
Tejun Heo	11124411aa	x86: convert to the new dynamic percpu allocator Impact: use new dynamic allocator, unified access to static/dynamic percpu memory Convert to the new dynamic percpu allocator. * implement populate_extra_pte() for both 32 and 64 * update setup_per_cpu_areas() to use pcpu_setup_static() * define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr() * define config HAVE_DYNAMIC_PER_CPU_AREA Signed-off-by: Tejun Heo <tj@kernel.org>	2009-02-20 16:29:09 +09:00
KAMEZAWA Hiroyuki	f2dbcfa738	mm: clean up for early_pfn_to_nid() What's happening is that the assertion in mm/page_alloc.c:move_freepages() is triggering: BUG_ON(page_zone(start_page) != page_zone(end_page)); Once I knew this is what was happening, I added some annotations: if (unlikely(page_zone(start_page) != page_zone(end_page))) { printk(KERN_ERR "move_freepages: Bogus zones: " "start_page[%p] end_page[%p] zone[%p]\n", start_page, end_page, zone); printk(KERN_ERR "move_freepages: " "start_zone[%p] end_zone[%p]\n", page_zone(start_page), page_zone(end_page)); printk(KERN_ERR "move_freepages: " "start_pfn[0x%lx] end_pfn[0x%lx]\n", page_to_pfn(start_page), page_to_pfn(end_page)); printk(KERN_ERR "move_freepages: " "start_nid[%d] end_nid[%d]\n", page_to_nid(start_page), page_to_nid(end_page)); ... And here's what I got: move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00] move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00] move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff] move_freepages: start_nid[1] end_nid[0] My memory layout on this box is: [ 0.000000] Zone PFN ranges: [ 0.000000] Normal 0x00000000 -> 0x0081ff5d [ 0.000000] Movable zone start PFN for each node [ 0.000000] early_node_map[8] active PFN ranges [ 0.000000] 0: 0x00000000 -> 0x00020000 [ 0.000000] 1: 0x00800000 -> 0x0081f7ff [ 0.000000] 1: 0x0081f800 -> 0x0081fe50 [ 0.000000] 1: 0x0081fed1 -> 0x0081fed8 [ 0.000000] 1: 0x0081feda -> 0x0081fedb [ 0.000000] 1: 0x0081fedd -> 0x0081fee5 [ 0.000000] 1: 0x0081fee7 -> 0x0081ff51 [ 0.000000] 1: 0x0081ff59 -> 0x0081ff5d So it's a block move in that 0x81f600-->0x81f7ff region which triggers the problem. This patch: Declaration of early_pfn_to_nid() is scattered over per-arch include files, and it seems it's complicated to know when the declaration is used. I think it makes fix-for-memmap-init not easy. This patch moves all declaration to include/linux/mm.h After this, if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use static definition in include/linux/mm.h else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID -> Use generic definition in mm/page_alloc.c else -> per-arch back end function will be called. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: David Miller <davem@davemlloft.net> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@kernel.org> [2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2009-02-18 15:37:55 -08:00
Linus Torvalds	35010334aa	Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, vm86: fix preemption bug x86, olpc: fix model detection without OFW x86, hpet: fix for LS21 + HPET = boot hang x86: CPA avoid repeated lazy mmu flush x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem x86/cpa: make sure cpa is safe to call in lazy mmu mode x86, ptrace, mm: fix double-free on race	2009-02-17 14:27:39 -08:00

... 14 15 16 17 18 ...

2380 Commits