linux/arch
Lee, Chun-Yi e3c41e37b0 x86/kexec: Fix kexec crash in syscall kexec_file_load()
The original bug is a page fault crash that sometimes happens
on big machines when preparing ELF headers:

    BUG: unable to handle kernel paging request at ffffc90613fc9000
    IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260

The bug is caused by us under-counting the number of memory ranges
and subsequently not allocating enough ELF header space for them.
The bug is typically masked on smaller systems, because the ELF header
allocation is rounded up to the next page.

This patch modifies the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to correctly
count the max number of crash memory ranges. That's because the
walk_system_ram_range() filters out small memory regions that
reside in the same page, but walk_system_ram_res() does not.

Here's how I found the bug:

After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to the program header, so it counts those small memory regions that
reside in a page area.

But, when the kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions,
it filters out small regions.

I printed those small memory regions, for example:

  kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0

Based on the code in walk_system_ram_range(), this memory region
will be filtered out:

  pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
  end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
  end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE

So, the max_nr_ranges that's counted by the kernel doesn't include
small memory regions - causing us to under-allocate the required space.
That causes the page fault crash that happens in a later code path
when preparing ELF headers.

This bug is not easy to reproduce on small machines that have few
CPUs, because the allocated page aligned ELF buffer has more free
space to cover those small memory regions' PT_LOAD headers.

Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02 09:13:06 +02:00
..
alpha PCI updates for v4.3: 2015-09-25 11:16:53 -07:00
arc genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
arm Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm 2015-09-27 06:48:48 -04:00
arm64 AMD fixes for bugs introduced in the 4.2 merge window, 2015-09-25 10:51:40 -07:00
avr32 genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
blackfin genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
c6x genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
cris CRISv10: delete unused lib/dmacopy.c 2015-09-05 00:56:51 +02:00
frv PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code" 2015-09-15 13:18:04 -05:00
h8300 dma-mapping: consolidate dma_set_mask 2015-09-10 13:29:01 -07:00
hexagon Merge branch 'akpm' (patches from Andrew) 2015-09-10 18:19:42 -07:00
ia64 PCI updates for v4.3: 2015-09-25 11:16:53 -07:00
m32r lib/decompressors: use real out buf size for gunzip with kernel 2015-09-10 13:29:01 -07:00
m68k genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
metag genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
microblaze PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code" 2015-09-15 13:18:04 -05:00
mips PCI updates for v4.3: 2015-09-25 11:16:53 -07:00
mn10300 PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code" 2015-09-15 13:18:04 -05:00
nios2 nios2: add Max10 defconfig 2015-09-08 18:16:02 +08:00
openrisc dma-mapping: consolidate dma_set_mask 2015-09-10 13:29:01 -07:00
parisc parisc: Use platform_device_register_simple("rtc-generic") 2015-09-08 17:53:48 +02:00
powerpc PCI updates for v4.3: 2015-09-25 11:16:53 -07:00
s390 AMD fixes for bugs introduced in the 4.2 merge window, 2015-09-25 10:51:40 -07:00
score Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:04:50 -07:00
sh genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
sparc genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
tile genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
um Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:04:50 -07:00
unicore32 genirq: Remove irq argument from irq flow handlers 2015-09-16 15:47:51 +02:00
x86 x86/kexec: Fix kexec crash in syscall kexec_file_load() 2015-10-02 09:13:06 +02:00
xtensa PCI: Revert "PCI: Call pci_read_bridge_bases() from core instead of arch code" 2015-09-15 13:18:04 -05:00
.gitignore
Kconfig kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00