linux/arch/x86
Luiz Capitulino 8b375f64dc x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()
The setup_node_data() function allocates a pg_data_t object,
inserts it into the node_data[] array and initializes the
following fields: node_id, node_start_pfn and
node_spanned_pages.

However, a few function calls later during the kernel boot,
free_area_init_node() re-initializes those fields, possibly with
setup_node_data() is not used.

This causes a small glitch when running Linux as a hyperv numa
guest:

  SRAT: PXM 0 -> APIC 0x00 -> Node 0
  SRAT: PXM 0 -> APIC 0x01 -> Node 0
  SRAT: PXM 1 -> APIC 0x02 -> Node 1
  SRAT: PXM 1 -> APIC 0x03 -> Node 1
  SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
  SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
  SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
  NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
  Initmem setup node 0 [mem 0x00000000-0x7fffffff]
    NODE_DATA [mem 0x7ffdc000-0x7ffeffff]
  Initmem setup node 1 [mem 0x80800000-0x1081fffff]
    NODE_DATA [mem 0x1081ea000-0x1081fdfff]
  crashkernel: memory value expected
   [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
   [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
  Zone ranges:
    DMA      [mem 0x00001000-0x00ffffff]
    DMA32    [mem 0x01000000-0xffffffff]
    Normal   [mem 0x100000000-0x1081fffff]
  Movable zone start for each node
  Early memory node ranges
    node   0: [mem 0x00001000-0x0009efff]
    node   0: [mem 0x00100000-0x7ffeffff]
    node   1: [mem 0x80200000-0xf7ffffff]
    node   1: [mem 0x100000000-0x1081fffff]
  On node 0 totalpages: 524174
    DMA zone: 64 pages used for memmap
    DMA zone: 21 pages reserved
    DMA zone: 3998 pages, LIFO batch:0
    DMA32 zone: 8128 pages used for memmap
    DMA32 zone: 520176 pages, LIFO batch:31
  On node 1 totalpages: 524288
    DMA32 zone: 7672 pages used for memmap
    DMA32 zone: 491008 pages, LIFO batch:31
    Normal zone: 520 pages used for memmap
    Normal zone: 33280 pages, LIFO batch:7

In this dmesg, the SRAT table reports that the memory range for
node 1 starts at 0x80200000.  However, the line starting with
"Initmem" reports that node 1 memory range starts at 0x80800000.
 The "Initmem" line is reported by setup_node_data() and is
wrong, because the kernel ends up using the range as reported in
the SRAT table.

This commit drops all that dead code from setup_node_data(),
renames it to alloc_node_data() and adds a printk() to
free_area_init_node() so that we report a node's memory range
accurately.

Here's the same dmesg section with this patch applied:

   SRAT: PXM 0 -> APIC 0x00 -> Node 0
   SRAT: PXM 0 -> APIC 0x01 -> Node 0
   SRAT: PXM 1 -> APIC 0x02 -> Node 1
   SRAT: PXM 1 -> APIC 0x03 -> Node 1
   SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
   SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff]
   SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff]
   NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff]
   NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff]
   NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff]
   crashkernel: memory value expected
    [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0
    [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1
   Zone ranges:
     DMA      [mem 0x00001000-0x00ffffff]
     DMA32    [mem 0x01000000-0xffffffff]
     Normal   [mem 0x100000000-0x1081fffff]
   Movable zone start for each node
   Early memory node ranges
     node   0: [mem 0x00001000-0x0009efff]
     node   0: [mem 0x00100000-0x7ffeffff]
     node   1: [mem 0x80200000-0xf7ffffff]
     node   1: [mem 0x100000000-0x1081fffff]
   Initmem setup node 0 [mem 0x00001000-0x7ffeffff]
   On node 0 totalpages: 524174
     DMA zone: 64 pages used for memmap
     DMA zone: 21 pages reserved
     DMA zone: 3998 pages, LIFO batch:0
     DMA32 zone: 8128 pages used for memmap
     DMA32 zone: 520176 pages, LIFO batch:31
   Initmem setup node 1 [mem 0x80200000-0x1081fffff]
   On node 1 totalpages: 524288
     DMA32 zone: 7672 pages used for memmap
     DMA32 zone: 491008 pages, LIFO batch:31
     Normal zone: 520 pages used for memmap
     Normal zone: 33280 pages, LIFO batch:7

This commit was tested on a two node bare-metal NUMA machine and
Linux as a numa guest on hyperv and qemu/kvm.

PS: The wrong memory range reported by setup_node_data() seems to be
    harmless in the current kernel because it's just not used.  However,
    that bad range is used in kernel 2.6.32 to initialize the old boot
    memory allocator, which causes a crash during boot.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-16 08:55:10 +02:00
..
boot * Enforce CONFIG_RELOCATABLE for the x86 EFI boot stub, otherwise 2014-08-11 13:58:54 -07:00
configs USB: remove CONFIG_USB_DEBUG from defconfig files 2014-05-28 09:40:45 -07:00
crypto Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2014-08-04 09:52:51 -07:00
ia32 x86, vdso: Reimplement vdso.so preparation in build-time C 2014-05-05 13:18:51 -07:00
include x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() 2014-09-16 08:55:10 +02:00
kernel Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-29 17:22:27 -07:00
kvm KVM: x86: do not check CS.DPL against RPL during task switch 2014-08-19 15:12:28 +02:00
lguest asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
lib Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:18:49 -07:00
math-emu asmlinkage, x86: Add explicit __visible to arch/x86/* 2014-05-05 16:07:44 -07:00
mm x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data() 2014-09-16 08:55:10 +02:00
net net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
oprofile x86, oprofile, nmi: Fix CPU hotplug callback registration 2014-03-20 13:43:43 +01:00
pci x86, irq, PCI: Keep IRQ assignment for runtime power management 2014-08-29 13:38:00 +02:00
platform Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-13 18:23:32 -06:00
power x86, power, suspend: Annotate restore_processor_state() with notrace 2014-07-17 09:45:05 -04:00
purgatory x86/purgatory: use approprate -m64/-32 build flag for arch/x86/purgatory 2014-08-29 16:28:16 -07:00
realmode x86/build: Supress realmode.bin is up to date message 2014-04-16 15:17:24 +02:00
syscalls kexec: new syscall kexec_file_load() declaration 2014-08-08 15:57:32 -07:00
tools x86/build: Supress "Nothing to be done for ..." messages 2014-04-14 11:44:36 +02:00
um Merge branch 'signal-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc 2014-08-09 09:58:12 -07:00
vdso arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area 2014-08-08 15:57:27 -07:00
video
xen x86/xen: use vmap() to map grant table pages in PVH guests 2014-08-11 11:59:35 +01:00
.gitignore
Kbuild kexec: create a new config option CONFIG_KEXEC_FILE for new syscall 2014-08-29 16:28:16 -07:00
Kconfig kexec: create a new config option CONFIG_KEXEC_FILE for new syscall 2014-08-29 16:28:16 -07:00
Kconfig.cpu Merge branch 'x86-nuke-platforms-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-04-02 13:15:58 -07:00
Kconfig.debug x86/efi: Dump the EFI page table 2014-03-04 16:17:17 +00:00
Makefile kexec: purgatory: add clean-up for purgatory directory 2014-08-29 16:28:17 -07:00
Makefile_32.cpu
Makefile.um