linux/arch/x86/kernel
Naoya Horiguchi 124049decb x86/e820: put !E820_TYPE_RAM regions into memblock.reserved
There is a kernel panic that is triggered when reading /proc/kpageflags
on the kernel booted with kernel parameter 'memmap=nn[KMG]!ss[KMG]':

  BUG: unable to handle kernel paging request at fffffffffffffffe
  PGD 9b20e067 P4D 9b20e067 PUD 9b210067 PMD 0
  Oops: 0000 [#1] SMP PTI
  CPU: 2 PID: 1728 Comm: page-types Not tainted 4.17.0-rc6-mm1-v4.17-rc6-180605-0816-00236-g2dfb086ef02c+ #160
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014
  RIP: 0010:stable_page_flags+0x27/0x3c0
  Code: 00 00 00 0f 1f 44 00 00 48 85 ff 0f 84 a0 03 00 00 41 54 55 49 89 fc 53 48 8b 57 08 48 8b 2f 48 8d 42 ff 83 e2 01 48 0f 44 c7 <48> 8b 00 f6 c4 01 0f 84 10 03 00 00 31 db 49 8b 54 24 08 4c 89 e7
  RSP: 0018:ffffbbd44111fde0 EFLAGS: 00010202
  RAX: fffffffffffffffe RBX: 00007fffffffeff9 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffffed1182fff5c0
  RBP: ffffffffffffffff R08: 0000000000000001 R09: 0000000000000001
  R10: ffffbbd44111fed8 R11: 0000000000000000 R12: ffffed1182fff5c0
  R13: 00000000000bffd7 R14: 0000000002fff5c0 R15: ffffbbd44111ff10
  FS:  00007efc4335a500(0000) GS:ffff93a5bfc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: fffffffffffffffe CR3: 00000000b2a58000 CR4: 00000000001406e0
  Call Trace:
   kpageflags_read+0xc7/0x120
   proc_reg_read+0x3c/0x60
   __vfs_read+0x36/0x170
   vfs_read+0x89/0x130
   ksys_pread64+0x71/0x90
   do_syscall_64+0x5b/0x160
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7efc42e75e23
  Code: 09 00 ba 9f 01 00 00 e8 ab 81 f4 ff 66 2e 0f 1f 84 00 00 00 00 00 90 83 3d 29 0a 2d 00 00 75 13 49 89 ca b8 11 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8 db d3 01 00 48 89 04 24

According to kernel bisection, this problem became visible due to commit
f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
which changes how struct pages are initialized.

Memblock layout affects the pfn ranges covered by node/zone.  Consider
that we have a VM with 2 NUMA nodes and each node has 4GB memory, and
the default (no memmap= given) memblock layout is like below:

  MEMBLOCK configuration:
   memory size = 0x00000001fff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x4
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000100000000-0x000000013fffffff], 0x0000000040000000 bytes on node 0 flags: 0x0
   memory[0x3]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

If you give memmap=1G!4G (so it just covers memory[0x2]),
the range [0x100000000-0x13fffffff] is gone:

  MEMBLOCK configuration:
   memory size = 0x00000001bff75c00 reserved size = 0x000000000300c000
   memory.cnt  = 0x3
   memory[0x0]     [0x0000000000001000-0x000000000009efff], 0x000000000009e000 bytes on node 0 flags: 0x0
   memory[0x1]     [0x0000000000100000-0x00000000bffd6fff], 0x00000000bfed7000 bytes on node 0 flags: 0x0
   memory[0x2]     [0x0000000140000000-0x000000023fffffff], 0x0000000100000000 bytes on node 1 flags: 0x0
   ...

This causes shrinking node 0's pfn range because it is calculated by the
address range of memblock.memory.  So some of struct pages in the gap
range are left uninitialized.

We have a function zero_resv_unavail() which does zeroing the struct pages
within the reserved unavailable range (i.e.  memblock.memory &&
!memblock.reserved).  This patch utilizes it to cover all unavailable
ranges by putting them into memblock.reserved.

Link: http://lkml.kernel.org/r/20180615072947.GB23273@hori1.linux.bs1.fc.nec.co.jp
Fixes: f7f99100d8 ("mm: stop zeroing memory during allocation in vmemmap")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Tested-by: Oscar Salvador <osalvador@suse.de>
Tested-by: "Herton R. Krzesinski" <herton@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 11:16:44 -07:00
..
acpi x86/acpi: Prevent X2APIC id 0xffffffff from being accounted 2018-04-17 11:56:31 +02:00
apic x86/platform/UV: Add kernel parameter to set memory block size 2018-06-21 16:14:46 +02:00
cpu Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-24 19:59:52 +08:00
fpu Merge commit 'upstream-x86-entry' into WIP.x86/mm 2017-12-17 12:58:53 +01:00
kprobes kprobes/x86: Prohibit probing on exception masking instructions 2018-05-13 19:52:55 +02:00
.gitignore
alternative.c x86/paravirt: Remove 'noreplace-paravirt' cmdline option 2018-01-31 10:37:45 +01:00
amd_gart_64.c x86/dma/amd_gart: Use dma_direct_{alloc,free}() 2018-03-20 10:01:57 +01:00
amd_nb.c x86/amd_nb: Add support for Raven Ridge CPUs 2018-05-13 09:00:27 -07:00
apb_timer.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-12-25 14:30:04 -08:00
aperture_64.c x86/gart: Exclude GART aperture from vmcore 2018-01-11 15:09:24 +01:00
apm_32.c Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-04 19:17:47 -07:00
asm-offsets_32.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
asm-offsets_64.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
asm-offsets.c Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
audit_64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
bootflag.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
check.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
cpuid.c x86/cpuid: Allow cpuid_read() to schedule 2018-03-27 12:01:48 +02:00
crash_dump_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
crash_dump_64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
crash.c kexec_file, x86: move re-factored code to generic side 2018-04-13 17:10:27 -07:00
devicetree.c Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-04-02 16:15:32 -07:00
doublefault.c x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss 2017-12-17 13:59:55 +01:00
dumpstack_32.c x86/dumpstack: Unify show_regs() 2018-03-08 12:04:59 +01:00
dumpstack_64.c x86/dumpstack: Unify show_regs() 2018-03-08 12:04:59 +01:00
dumpstack.c x86/dumpstack: Explain the reasoning for the prologue and buffer size 2018-04-26 16:15:28 +02:00
e820.c x86/e820: put !E820_TYPE_RAM regions into memblock.reserved 2018-06-28 11:16:44 -07:00
early_printk.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
early-quirks.c x86/early-quirks: Rename duplicate define of dev_err 2018-05-13 20:04:35 +02:00
ebda.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
eisa.c x86/eisa: Add missing include 2017-08-31 21:34:48 +02:00
espfix_64.c x86/espfix: Document use of _PAGE_GLOBAL 2018-04-09 18:27:33 +02:00
ftrace_32.S x86/retpoline/ftrace: Convert ftrace assembler indirect jumps 2018-01-12 00:14:30 +01:00
ftrace_64.S Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-01-28 12:19:23 -08:00
ftrace.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
head32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
head64.c Revert "x86/mm: Mark __pgtable_l5_enabled __initdata" 2018-06-23 14:20:37 +02:00
head_32.S Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables 2018-06-14 12:21:18 +09:00
head_64.S x86/mm: Comment _PAGE_GLOBAL mystery 2018-04-12 09:05:58 +02:00
hpet.c treewide: kzalloc() -> kcalloc() 2018-06-12 16:19:22 -07:00
hw_breakpoint.c
i8237.c x86/i8237: Register device based on FADT legacy boot flag 2018-04-27 16:44:29 +02:00
i8253.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
i8259.c Merge branch 'linus' into x86/apic, to resolve conflicts 2017-11-07 10:51:10 +01:00
idt.c x86/idt: Simplify the idt_setup_apic_and_irq_gates() 2018-06-06 13:38:01 +02:00
io_delay.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
ioport.c x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm() 2018-04-02 20:16:12 +02:00
irq_32.c x86/retpoline/irq32: Convert assembler indirect jumps 2018-01-12 00:14:32 +01:00
irq_64.c x86/irq/64: Print the offending IP in the stack overflow warning 2017-12-17 13:59:53 +01:00
irq_work.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq.c Drivers: hv: vmbus: Implement Direct Mode for stimer0 2018-03-06 09:57:17 -08:00
irqinit.c x86/apic: Simplify init_bsp_APIC() usage 2018-02-13 17:30:38 +01:00
itmt.c x86/headers: Remove duplicate #includes 2017-12-12 11:32:24 +01:00
jailhouse.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
jump_label.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kdebugfs.c x86, mpparse, x86/acpi, x86/PCI, x86/dmi, SFI: Use memremap() for RAM mappings 2017-07-18 11:37:58 +02:00
kexec-bzimage64.c kexec_file: do not add extra alignment to efi memmap 2018-04-20 17:18:36 -07:00
kgdb.c
ksysfs.c treewide: kmalloc() -> kmalloc_array() 2018-06-12 16:19:22 -07:00
kvm.c kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME 2018-05-17 19:12:13 +02:00
kvmclock.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
ldt.c x86/ldt: Fix support_pte_mask filtering in map_ldt_struct() 2018-04-16 11:20:34 -07:00
livepatch.c
machine_kexec_32.c x86/kexec: Avoid double free_page() upon do_kexec_load() failure 2018-05-13 19:50:06 +02:00
machine_kexec_64.c x86/mm: Stop pretending pgtable_l5_enabled is a variable 2018-05-19 11:56:57 +02:00
Makefile Merge branch 'x86-dma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-04-02 17:18:45 -07:00
mmconf-fam10h_64.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
module.c x86: Treat R_X86_64_PLT32 as R_X86_64_PC32 2018-02-22 09:01:10 -08:00
mpparse.c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-02-14 17:02:15 -08:00
msr.c x86/msr: Remove bogus cleanup from the error path 2016-12-25 10:47:41 +01:00
nmi_selftest.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
nmi.c locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE() 2017-10-25 11:01:08 +02:00
paravirt_patch_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
paravirt_patch_64.c x86/paravirt: Dont patch flush_tlb_single 2017-12-17 14:27:52 +01:00
paravirt-spinlocks.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
paravirt.c x86/mm: Rename flush_tlb_single() and flush_tlb_one() to __flush_tlb_one_[user|kernel]() 2018-02-15 01:15:52 +01:00
pci-calgary_64.c x86/dma: Remove dma_alloc_coherent_gfp_flags() 2018-03-20 10:01:58 +01:00
pci-dma.c x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag 2018-05-28 12:48:25 +02:00
pci-iommu_table.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pci-swiotlb.c x86/dma: Use generic swiotlb_ops 2018-03-20 10:01:57 +01:00
pcspeaker.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
perf_regs.c perf/x86: Store user space frame-pointer value on a sample 2018-05-25 08:11:12 +02:00
platform-quirks.c x86/i8237: Register device based on FADT legacy boot flag 2018-04-27 16:44:29 +02:00
pmem.c resource: Provide resource struct in resource walk callback 2017-11-07 15:35:57 +01:00
probe_roms.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
process_32.c x86/dumpstack: Add a show_ip() function 2018-04-26 16:15:27 +02:00
process_64.c x86/mm: Drop TS_COMPAT on 64-bit exec() syscall 2018-05-19 12:31:05 +02:00
process.c x86/speculation: Rework speculative_store_bypass_update() 2018-05-17 17:09:19 +02:00
ptrace.c signal: Ensure every siginfo we send has all bits initialized 2018-04-25 10:40:51 -05:00
pvclock.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
quirks.c x86/mce: Check for alternate indication of machine check recovery on Skylake 2018-06-07 22:22:12 +02:00
reboot_fixups_32.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c x86/apic: Fix restoring boot IRQ mode in reboot and kexec/kdump 2018-02-17 11:47:45 +01:00
relocate_kernel_32.S
relocate_kernel_64.S x86/kexec: Make kexec (mostly) work in 5-level paging mode 2018-01-31 08:39:40 +01:00
resource.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
rtc.c x86: Convert x86_platform_ops to timespec64 2018-05-19 14:03:14 +02:00
setup_percpu.c x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table 2018-03-01 09:48:27 +01:00
setup.c mm/pkeys, x86, powerpc: Display pkey in smaps if arch supports pkeys 2018-05-09 11:51:49 +10:00
signal_compat.c signal: Add TRAP_UNK si_code for undiagnosted trap exceptions 2018-04-25 10:40:56 -05:00
signal.c rseq: Avoid infinite recursion when delivering SIGSEGV 2018-06-22 19:04:22 +02:00
smp.c x86/tracing: Disentangle pagefault and resched IPI tracing key 2017-08-29 11:42:29 +02:00
smpboot.c Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-06-04 18:19:18 -07:00
stacktrace.c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2018-01-03 16:41:07 -08:00
step.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sys_x86_64.c compat: Move compat_timespec/ timeval to compat_time.h 2018-04-19 13:29:54 +02:00
sysfb_efi.c
sysfb_simplefb.c
sysfb.c
tboot.c x86/pti: Make unpoison of pgd for trusted boot work for real 2018-01-11 23:36:59 +01:00
tce_64.c
time.c x86/time: Unconditionally register legacy timer interrupt 2018-01-14 20:18:23 +01:00
tls.c x86/ldt: Make the LDT mapping RO 2017-12-23 21:13:01 +01:00
tls.h
topology.c
trace_clock.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
tracepoint.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
traps.c Merge branch 'linus' into x86/urgent 2018-06-22 21:20:35 +02:00
tsc_msr.c
tsc_sync.c Merge branch 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-11-13 19:07:38 -08:00
tsc.c x86/tsc: Fix mark_tsc_unstable() 2018-05-02 16:10:40 +02:00
umip.c signal: Ensure every siginfo we send has all bits initialized 2018-04-25 10:40:51 -05:00
unwind_frame.c x86/unwind: Disable unwinder warnings on 32-bit 2017-10-10 12:49:49 +02:00
unwind_guess.c x86/unwind: Add the ORC unwinder 2017-07-26 13:18:20 +02:00
unwind_orc.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
uprobes.c uprobes/x86: Remove incorrect WARN_ON() in uprobe_init_insn() 2018-06-21 17:11:02 +02:00
verify_cpu.S x86/boot: Annotate verify_cpu() as a callable function 2017-09-28 09:39:03 +02:00
vm86_32.c x86/vm86/32: Fix POPF emulation 2018-03-14 09:21:01 +01:00
vmlinux.lds.S x86/build: Remove no-op macro VMLINUX_SYMBOL() 2018-05-13 15:11:34 +02:00
vsmp_64.c x86/apic: Remove unused callbacks 2017-09-25 20:51:58 +02:00
x86_init.c xen, mm: allow deferred page initialization for xen pv domains 2018-04-11 10:28:38 -07:00