No functional change in this patch. arch_debugfs_dir is the generic kernel
name declared in linux/debugfs.h for arch-specific debugfs directory.
Architectures like x86/s390 already use the name. Rename powerpc
specific powerpc_debugfs_root to arch_debugfs_dir.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210812132831.233794-2-aneesh.kumar@linux.ibm.com
Sparse says:
arch/powerpc/kernel/fadump.c:48:16: warning: symbol 'fadump_kobj' was not declared. Should it be static?
arch/powerpc/kernel/fadump.c:55:27: warning: symbol 'crash_mrange_info' was not declared. Should it be static?
arch/powerpc/kernel/fadump.c:61:27: warning: symbol 'reserved_mrange_info' was not declared. Should it be static?
arch/powerpc/kernel/fadump.c:83:12: warning: symbol 'fadump_cma_init' was not declared. Should it be static?
And indeed none of them are used outside this file, they can all be made
static. Also fadump_kobj needs to be moved inside the ifdef where it's
used.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210421125402.1955013-1-mpe@ellerman.id.au
sfr reports that the allyesconfig build fails with:
arch/powerpc/kernel/fadump.c: In function 'crash_fadump':
arch/powerpc/kernel/fadump.c:731:28: error: 'INTERRUPT_SYSTEM_RESET' undeclared
731 | if (TRAP(&(fdh->regs)) == INTERRUPT_SYSTEM_RESET) {
Add an include of interrupt.h to fix it.
Fixes: 7153d4bf0b ("powerpc/traps: Enhance readability for trap types")
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[mpe: Reformat change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210419191425.281dc58a@canb.auug.org.au
Define macros to list ppc interrupt types in interttupt.h, replace the
reference of the trap hex values with these macros.
Referred the hex numbers in arch/powerpc/kernel/exceptions-64e.S,
arch/powerpc/kernel/exceptions-64s.S, arch/powerpc/kernel/head_*.S,
arch/powerpc/kernel/head_booke.h and arch/powerpc/include/asm/kvm_asm.h.
Signed-off-by: Xiongwei Song <sxwjean@gmail.com>
[mpe: Resolve conflicts in nmi_disables_ftrace(), fix 40x build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1618398033-13025-1-git-send-email-sxwjean@me.com
If fadump_calculate_reserve_size() is not inlined, there is a modpost
warning:
WARNING: modpost: vmlinux.o(.text+0x5196c): Section mismatch in
reference from the function fadump_calculate_reserve_size() to the
function .init.text:parse_crashkernel()
The function fadump_calculate_reserve_size() references
the function __init parse_crashkernel().
This is often because fadump_calculate_reserve_size lacks a __init
annotation or the annotation of parse_crashkernel is wrong.
fadump_calculate_reserve_size() calls parse_crashkernel(), which is
marked as __init and fadump_calculate_reserve_size() is called from
within fadump_reserve_mem(), which is also marked as __init.
Mark fadump_calculate_reserve_size() as __init to fix the section
mismatch. Additionally, remove the inline keyword as it is not necessary
to inline this function; the compiler is still free to do so if it feels
it is worthwhile since commit 889b3c1245 ("compiler: remove
CONFIG_OPTIMIZE_INLINING entirely").
Fixes: 11550dc0a0 ("powerpc/fadump: reuse crashkernel parameter for fadump memory reservation")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://github.com/ClangBuiltLinux/linux/issues/1300
Link: https://lore.kernel.org/r/20210302195013.2626335-1-nathan@kernel.org
- A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting it for
powerpc, as well as a related fix for sparc.
- Remove support for PowerPC 601.
- Some fixes for watchpoints & addition of a new ptrace flag for detecting ISA
v3.1 (Power10) watchpoint features.
- A fix for kernels using 4K pages and the hash MMU on bare metal Power9
systems with > 16TB of RAM, or RAM on the 2nd node.
- A basic idle driver for shallow stop states on Power10.
- Tweaks to our sched domains code to better inform the scheduler about the
hardware topology on Power9/10, where two SMT4 cores can be presented by
firmware as an SMT8 core.
- A series doing further reworks & cleanups of our EEH code.
- Addition of a filter for RTAS (firmware) calls done via sys_rtas(), to
prevent root from overwriting kernel memory.
- Other smaller features, fixes & cleanups.
Thanks to:
Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Athira Rajeev, Biwen
Li, Cameron Berkenpas, Cédric Le Goater, Christophe Leroy, Christoph Hellwig,
Colin Ian King, Daniel Axtens, David Dai, Finn Thain, Frederic Barrat, Gautham
R. Shenoy, Greg Kurz, Gustavo Romero, Ira Weiny, Jason Yan, Joel Stanley,
Jordan Niethe, Kajol Jain, Konrad Rzeszutek Wilk, Laurent Dufour, Leonardo
Bras, Liu Shixin, Luca Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar,
Nathan Lynch, Nicholas Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver
O'Halloran, Pedro Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai,
Qinglang Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott
Cheloha, Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
Yingliang, zhengbin.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl+JBQoTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJJAD/0e3tsFP+9rFlxKSJlDcMW3w7kXDRXE
tG40F1ubYFLU8wtFVR0De3njTRsz5HyaNU6SI8CwPq48mCa7OFn1D1OeHonHXDX9
w6v3GE2S1uXXQnjm+czcfdjWQut0IwWBLx007/S23WcPff3Abc2irupKLNu+Gx29
b/yxJHZSRJVX59jSV94HkdJS75mDHQ3oUOlFGXtuGcUZDufpD1ynRcQOjr0V/8JU
F4WAblFSe7hiczHGqIvfhFVJ+OikEhnj2aEMAL8U7vxzrAZ7RErKCN9s/0Tf0Ktx
FzNEFNLHZGqh+qNDpKKmM+RnaeO2Lcoc9qVn7vMHOsXPzx9F5LJwkI/DgPjtgAq/
mFvGnQB/FapATnQeMluViC/qhEe5bQXLUfPP5i2+QOjK0QqwyFlUMgaVNfsY8jRW
0Q/sNA72Opzst4WUTveCd4SOInlUuat09e5nLooCRLW7u7/jIiXNRSFNvpOiwkfF
EcIPJsi6FUQ4SNbqpRSNEO9fK5JZrrUtmr0pg8I7fZhHYGcxEjqPR6IWCs3DTsak
4/KhjhhTnP/IWJRw6qKAyNhEyEwpWqYZ97SIQbvSb1g/bS47AIdQdJRb0eEoRjhx
sbbnnYFwPFkG4c1yQSIFanT9wNDQ2hFx/c/mRfbd7J+ordx9JsoqXjqrGuhsU/pH
GttJLmkJ5FH+pQ==
=akeX
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting
it for powerpc, as well as a related fix for sparc.
- Remove support for PowerPC 601.
- Some fixes for watchpoints & addition of a new ptrace flag for
detecting ISA v3.1 (Power10) watchpoint features.
- A fix for kernels using 4K pages and the hash MMU on bare metal
Power9 systems with > 16TB of RAM, or RAM on the 2nd node.
- A basic idle driver for shallow stop states on Power10.
- Tweaks to our sched domains code to better inform the scheduler about
the hardware topology on Power9/10, where two SMT4 cores can be
presented by firmware as an SMT8 core.
- A series doing further reworks & cleanups of our EEH code.
- Addition of a filter for RTAS (firmware) calls done via sys_rtas(),
to prevent root from overwriting kernel memory.
- Other smaller features, fixes & cleanups.
Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe
Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn
Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero,
Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad
Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca
Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas
Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro
Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang
Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha,
Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
Yingliang, zhengbin.
* tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits)
Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"
selftests/powerpc: Fix eeh-basic.sh exit codes
cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
powerpc/time: Make get_tb() common to PPC32 and PPC64
powerpc/time: Make get_tbl() common to PPC32 and PPC64
powerpc/time: Remove get_tbu()
powerpc/time: Avoid using get_tbl() and get_tbu() internally
powerpc/time: Make mftb() common to PPC32 and PPC64
powerpc/time: Rename mftbl() to mftb()
powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S
powerpc/32s: Rename head_32.S to head_book3s_32.S
powerpc/32s: Setup the early hash table at all time.
powerpc/time: Remove ifdef in get_dec() and set_dec()
powerpc: Remove get_tb_or_rtc()
powerpc: Remove __USE_RTC()
powerpc: Tidy up a bit after removal of PowerPC 601.
powerpc: Remove support for PowerPC 601
powerpc: Remove PowerPC 601
powerpc: Drop SYNC_601() ISYNC_601() and SYNC()
powerpc: Remove CONFIG_PPC601_SYNC_FIX
...
There are several occurrences of the following pattern:
for_each_memblock(memory, reg) {
start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));
/* do something with start and end */
}
Using for_each_mem_range() iterator is more appropriate in such cases and
allows simpler and cleaner code.
[akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
[rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are several occurrences of the following pattern:
for_each_memblock(memory, reg) {
start_pfn = memblock_region_memory_base_pfn(reg);
end_pfn = memblock_region_memory_end_pfn(reg);
/* do something with start_pfn and end_pfn */
}
Rather than iterate over all memblock.memory regions and each time query
for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
simpler and clearer code.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Baoquan He <bhe@redhat.com>
Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> [.clang-format]
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Emil Renner Berthing <kernel@esmil.dk>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: https://lkml.kernel.org/r/20200818151634.14343-12-rppt@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix gcc '-Wunused-but-set-variable' warning:
arch/powerpc/kernel/fadump.c: In function fadump_update_elfcore_header:
arch/powerpc/kernel/fadump.c:790:17: warning: variable elf set but not used [-Wunused-but-set-variable]
It is introduced by commit ebaeb5ae24 ("fadump:
Convert firmware-assisted cpu state dump data into elf notes."),
but never used, so remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1574144074-142032-2-git-send-email-zhengbin13@huawei.com
skiroot_defconfig fails:
arch/powerpc/kernel/fadump.c:48:17: error: ‘cpus_in_fadump’ defined but not used
48 | static atomic_t cpus_in_fadump;
Fix it by moving the definition into the #ifdef where it's used.
Fixes: ba608c4fa1 ("powerpc/fadump: fix race between pstore write and fadump crash trigger")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200727070341.595634-1-mpe@ellerman.id.au
When we enter into fadump crash path via system reset we fail to update
the pstore.
On the system reset path we first update the pstore then we go for fadump
crash. But the problem here is when all the CPUs try to get the pstore
lock to initiate the pstore write, only one CPUs will acquire the lock
and proceed with the pstore write. Since it in NMI context CPUs that fail
to get lock do not wait for their turn to write to the pstore and simply
proceed with the next operation which is fadump crash. One of the CPU who
proceeded with fadump crash path triggers the crash and does not wait for
the CPU who gets the pstore lock to complete the pstore update.
Timeline diagram to depicts the sequence of events that leads to an
unsuccessful pstore update when we hit fadump crash path via system reset.
1 2 3 ... n CPU Threads
| | | |
| | | |
Reached to -->|--->|---->| ----------->|
system reset | | | |
path | | | |
| | | |
Try to -->|--->|---->|------------>|
acquire the | | | |
pstore lock | | | |
| | | |
| | | |
Got the -->| +->| | |<-+
pstore lock | | | | | |--> Didn't get the
| --------------------------+ lock and moving
| | | | ahead on fadump
| | | | crash path
| | | |
Begins the -->| | | |
process to | | | |<-- Got the chance to
update the | | | | trigger the crash
pstore | -> | | ... <- |
| | | | | |
| | | | | |<-- Triggers the
| | | | | | crash
| | | | | | ^
| | | | | | |
Writing to -->| | | | | | |
pstore | | | | | | |
| | |
^ |__________________| |
| CPU Relax |
| |
+-----------------------------------------+
|
v
Race: crash triggered before pstore
update completes
To avoid this race condition a barrier is added on crash_fadump path, it
prevents the CPU to trigger the crash until all the online CPUs completes
their task.
A barrier is added to make sure all the secondary CPUs hit the
crash_fadump function before we initiates the crash. A timeout is kept to
ensure the primary CPU (one who initiates the crash) do not wait for
secondary CPUs indefinitely.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200713052435.183750-1-sourabhjain@linux.ibm.com
If the memory chunk found for reserving memory overshoots the memory
limit imposed, do not proceed with reserving memory. Default behavior
was this until commit 140777a3d8 ("powerpc/fadump: consider reserved
ranges while reserving memory") changed it unwittingly.
Fixes: 140777a3d8 ("powerpc/fadump: consider reserved ranges while reserving memory")
Cc: stable@vger.kernel.org
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/159057266320.22331.6571453892066907320.stgit@hbathini.in.ibm.com
Commit 0962e8004e ("powerpc/prom: Scan reserved-ranges node for
memory reservations") enabled support to parse reserved-ranges DT
node and reserve kernel memory falling in these ranges for F/W
purposes. Memory reserved for FADump should not overlap with these
ranges as it could corrupt memory meant for F/W or crash'ed kernel
memory to be exported as vmcore.
But since commit 579ca1a276 ("powerpc/fadump: make use of memblock's
bottom up allocation mode"), memblock_find_in_range() is being used to
find the appropriate area to reserve memory for FADump, which can't
account for reserved-ranges as these ranges are reserved only after
FADump memory reservation.
With reserved-ranges now being populated during early boot, look out
for these memory ranges while reserving memory for FADump. Without
this change, MPIPL on PowerNV systems aborts with hostboot failure,
when memory reserved for FADump is less than 4096MB.
Fixes: 579ca1a276 ("powerpc/fadump: make use of memblock's bottom up allocation mode")
Cc: stable@vger.kernel.org
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/158737297693.26700.16193820746269425424.stgit@hbathini.in.ibm.com
At times, memory ranges have to be looked up during early boot, when
kernel couldn't be initialized for dynamic memory allocation. In fact,
reserved-ranges look up is needed during FADump memory reservation.
Without accounting for reserved-ranges in reserving memory for FADump,
MPIPL boot fails with memory corruption issues. So, extend memory
ranges handling to support static allocation and populate reserved
memory ranges during early boot.
Fixes: dda9dbfeeb ("powerpc/fadump: consider reserved ranges while releasing memory")
Cc: stable@vger.kernel.org
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/158737294432.26700.4830263187856221314.stgit@hbathini.in.ibm.com
When calling debugfs functions, there is no need to ever check the
return value. The function can work or not, but the code logic should
never do something different based on this.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200209105901.1620958-1-gregkh@linuxfoundation.org
Add a sys interface to allow querying the memory reserved by FADump for
saving the crash dump.
Also added Documentation/ABI for the new sysfs file.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191211160910.21656-7-sourabhjain@linux.ibm.com
As the number of FADump sysfs files increases it is hard to manage all
of them inside /sys/kernel directory. It's better to have all the
FADump related sysfs files in a dedicated directory
/sys/kernel/fadump. But in order to maintain backward compatibility a
symlink has been added for every sysfs that has moved to new location.
As the FADump sysfs files are now part of a dedicated directory there
is no need to prefix their name with fadump_, hence sysfs file names
are also updated. For example fadump_enabled sysfs file is now
referred as enabled.
Also consolidate ABI documentation for all the FADump sysfs files in a
single file Documentation/ABI/testing/sysfs-kernel-fadump.
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Tested-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191211160910.21656-4-sourabhjain@linux.ibm.com
Currently it is not possible to distinguish the case when fadump is
supported by firmware and disabled in kernel and completely unsupported
using the kernel sysfs interface. User can investigate the devicetree
but it is more reasonable to provide sysfs files in case we get some
fadumpv2 in the future.
With this patch sysfs files are available whenever fadump is supported
by firmware.
There is duplicate message about lack of support by firmware in
fadump_reserve_mem and setup_fadump. Remove the duplicate message in
setup_fadump.
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191107164757.15140-1-msuchanek@suse.de
With support to copy multiple kernel boot memory regions owing to copy
size limitation, also handle holes in the memory area to be preserved.
Support as many as 128 kernel boot memory regions. This allows having
an adequate FADump capture kernel size for different scenarios.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821385448.5656.6124791213910877759.stgit@hbathini.in.ibm.com
RMA_START is defined as '0' and there is even a BUILD_BUG_ON() to
make sure it is never anything else. Remove this macro and use '0'
instead as code change is needed anyway when it has to be something
else. Also, remove unused RMA_END macro.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821384096.5656.15026984053970204652.stgit@hbathini.in.ibm.com
OPAL loads kernel & initrd at 512MB offset (256MB size), also exported
as ibm,opal/dump/fw-load-area. So, if boot memory size of FADump is
less than 768MB, kernel memory to be exported as '/proc/vmcore' would
be overwritten by f/w while loading kernel & initrd. To avoid such a
scenario, enforce a minimum boot memory size of 768MB on OPAL platform
and skip using FADump if a newer F/W version loads kernel & initrd
above 768MB.
Also, irrespective of RMA size, set the minimum boot memory size
expected on pseries platform at 320MB. This is to avoid inflating the
minimum memory requirements on systems with 512M/1024M RMA size.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821381414.5656.1592867278535469652.stgit@hbathini.in.ibm.com
Add a new kernel config option, CONFIG_PRESERVE_FA_DUMP that ensures
that crash data, from previously crash'ed kernel, is preserved. This
helps in cases where FADump is not enabled but the subsequent memory
preserving kernel boot is likely to process this crash data. One
typical usecase for this config option is petitboot kernel.
As OPAL allows registering address with it in the first kernel and
retrieving it after MPIPL, use it to store the top of boot memory.
A kernel that intends to preserve crash data retrieves it and avoids
using memory beyond this address.
Move arch_reserved_kernel_pages() function as it is needed for both
FA_DUMP and PRESERVE_FA_DUMP configurations.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821375751.5656.11459483669542541602.stgit@hbathini.in.ibm.com
The size parameter to fadump_reserve_crash_area() function is not needed
as all the memory above boot memory size must be preserved anyway. Update
the function by dropping this redundant parameter.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821374440.5656.2945512543806951766.stgit@hbathini.in.ibm.com
Commit 0962e8004e ("powerpc/prom: Scan reserved-ranges node for
memory reservations") enabled support to parse 'reserved-ranges' DT
node to reserve kernel memory falling in these ranges for firmware
purposes. Along with the preserved area memory, ensure memory in
reserved ranges is not overlapped with memory released by capture
kernel aftering saving vmcore. Also, fix the off-by-one error in
fadump_release_reserved_area function while releasing memory.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821371358.5656.6061214942558818661.stgit@hbathini.in.ibm.com
Make allocate_crash_memory_ranges() and free_crash_memory_ranges()
functions generic to reuse them for memory management of all types of
dynamic memory range arrays. This change helps in memory management
of reserved ranges array to be added later.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821369863.5656.4375667005352155892.stgit@hbathini.in.ibm.com
Earlier, memblock_find_in_range() was not used to find the memory to
be reserved for FADump as bottom up allocation mode was not supported.
But since commit 79442ed189 ("mm/memblock.c: introduce bottom-up
allocation mode") bottom up allocation mode is supported for memblock.
So, use it to find the memory to be reserved for FADump.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821364211.5656.14336025460336135194.stgit@hbathini.in.ibm.com
Make OPAL call to indicate that the dump is processed and the metadata
area in OPAL can be cleared/released. Also, setup/initialize FADump
for re-registration.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821356046.5656.12270927048195494911.stgit@hbathini.in.ibm.com
During kexec boot, metadata address needs to be reset to avoid running
into errors interpreting stale metadata address, in case the kexec'ed
kernel crashes before metadata address could be setup again.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821346629.5656.10783321582005237813.stgit@hbathini.in.ibm.com
OPAL allows registering address with it in the first kernel and
retrieving it after MPIPL. Setup kernel metadata and register its
address with OPAL to use it for processing the crash dump.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821345011.5656.13567765019032928471.stgit@hbathini.in.ibm.com
Except for Reserved dump area (see Documentation/powerpc/firmware-
assisted-dump.rst) which is permanent reserved, all memory above boot
memory size, where boot memory size is the memory required for the
kernel to boot successfully when booted with restricted memory (memory
for capture kernel), is released when the dump is invalidated. Make
this a bit more explicit in the code.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821336092.5656.1079046285366041687.stgit@hbathini.in.ibm.com
Move platform specific register/un-register code, the RTAS calls, to
register/un-register callback functions. This would also mean moving
code that initializes and prints the platform specific FADump data.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821332856.5656.16380417702046411631.stgit@hbathini.in.ibm.com
Introduce callback functions for platform specific operations like
register, unregister, invalidate & such. Also, define place-holders
for the same on pSeries platform.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821330286.5656.15538934400074110770.stgit@hbathini.in.ibm.com
Currently, FADump is only supported on pSeries but that is going to
change soon with FADump support being added on PowerNV platform. So,
move rtas specific definitions to platform code to allow FADump
to have multiple platforms support.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821328494.5656.16219929140866195511.stgit@hbathini.in.ibm.com
Add helper functions to setup & free CPU notes buffer and to find if a
given memory area is contiguous. Also, use boolean as return type for
the function that finds if boot memory area is contiguous. While at
it, save the virtual address of CPU notes buffer instead of physical
address as virtual address is used often.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821318971.5656.9281936950510635858.stgit@hbathini.in.ibm.com
Though asm/fadump.h is meant to be used by other components dealing
with FADump, it also has macros/definitions internal to FADump code.
Move them to a new header file used within FADump code. This also
makes way for refactoring platform specific FADump code.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/156821313134.5656.6597770626574392140.stgit@hbathini.in.ibm.com
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1334 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.113240726@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No need to have this in asm/page.h, move it into asm/hugetlb.h
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For fadump to work successfully there should not be any holes in reserved
memory ranges where kernel has asked firmware to move the content of old
kernel memory in event of crash. Now that fadump uses CMA for reserved
area, this memory area is now not protected from hot-remove operations
unless it is cma allocated. Hence, fadump service can fail to re-register
after the hot-remove operation, if hot-removed memory belongs to fadump
reserved region. To avoid this make sure that memory from fadump reserved
area is not hot-removable if fadump is registered.
However, if user still wants to remove that memory, he can do so by
manually stopping fadump service before hot-remove operation.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
fadump fails to register when there are holes in reserved memory area.
This can happen if user has hot-removed a memory that falls in the
fadump reserved memory area. Throw a meaningful error message to the
user in such case.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: is_reserved_memory_area_contiguous() returns bool, unsplit string]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
One of the primary issues with Firmware Assisted Dump (fadump) on Power
is that it needs a large amount of memory to be reserved. On large
systems with TeraBytes of memory, this reservation can be quite
significant.
In some cases, fadump fails if the memory reserved is insufficient, or
if the reserved memory was DLPAR hot-removed.
In the normal case, post reboot, the preserved memory is filtered to
extract only relevant areas of interest using the makedumpfile tool.
While the tool provides flexibility to determine what needs to be part
of the dump and what memory to filter out, all supported distributions
default this to "Capture only kernel data and nothing else".
We take advantage of this default and the Linux kernel's Contiguous
Memory Allocator (CMA) to fundamentally change the memory reservation
model for fadump.
Instead of setting aside a significant chunk of memory nobody can use,
this patch uses CMA instead, to reserve a significant chunk of memory
that the kernel is prevented from using (due to MIGRATE_CMA), but
applications are free to use it. With this fadump will still be able
to capture all of the kernel memory and most of the user space memory
except the user pages that were present in CMA region.
Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 7557 193 6822 12 541 6725
Swap: 4095 0 4095
With this patch:
[root@zzxx-yy10 ~]# free -m
total used free shared buff/cache available
Mem: 8133 194 7464 12 475 7338
Swap: 4095 0 4095
Changes made here are completely transparent to how fadump has
traditionally worked.
Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
CMA and its usage.
TODO:
- Handle case where CMA reservation spans nodes.
Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Firmware-Assisted Dump (FADump) needs to be registered again after any
memory hot add/remove operation to update the crash memory ranges. But
currently, the kernel returns '-EEXIST' if we try to register without
uregistering it first. This could expose the system to racing issues
while unregistering and registering FADump from userspace during udev
events. Spare the userspace of this and let it be taken care of in the
kernel space for a simpler interface.
Since this change, running 'echo 1 > /sys/kernel/fadump_registered'
would result in re-regisering (unregistering and registering) FADump,
if it was already registered.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Acked-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Commit 1bd6a1c4b8 ("powerpc/fadump: handle crash memory ranges array
index overflow") changed crash memory ranges to a dynamic array that
is reallocated on-demand with krealloc(). The relevant header for this
call was not included. The kernel compiles though. But be cautious and
add the header anyway.
Also, memory allocation logic in fadump_add_crash_memory() takes care
of memory allocation for crash memory ranges in all scenarios. Drop
unnecessary memory allocation in fadump_setup_crash_memory_ranges().
Fixes: 1bd6a1c4b8 ("powerpc/fadump: handle crash memory ranges array index overflow")
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
With dynamic memory allocation support for crash memory ranges array,
there is no hard limit on the no. of crash memory ranges kernel could
export, but program headers count could overflow in the /proc/vmcore
ELF file while exporting each memory range as PT_LOAD segment. Reduce
the likelihood of a such scenario, by folding adjacent crash memory
ranges which minimizes the total number of PT_LOAD segments.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e ("memblock: Add array resizing support").
On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:
task: c00007f39a290010 ti: c00000000b738000 task.ti: c00000000b738000
NIP: c000000000047df4 LR: c0000000000f9e58 CTR: c00000000010f180
REGS: c00000000b73b570 TRAP: 0300 Tainted: G L X (4.4.140+)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22004484 XER: 20000000
CFAR: c000000000008500 DAR: 000007a450000000 DSISR: 40000000 SOFTE: 0
...
NIP [c000000000047df4] smp_send_reschedule+0x24/0x80
LR [c0000000000f9e58] resched_curr+0x138/0x160
Call Trace:
resched_curr+0x138/0x160 (unreliable)
check_preempt_curr+0xc8/0xf0
ttwu_do_wakeup+0x38/0x150
try_to_wake_up+0x224/0x4d0
__wake_up_common+0x94/0x100
ep_poll_callback+0xac/0x1c0
__wake_up_common+0x94/0x100
__wake_up_sync_key+0x70/0xa0
sock_def_readable+0x58/0xa0
unix_stream_sendmsg+0x2dc/0x4c0
sock_sendmsg+0x68/0xa0
___sys_sendmsg+0x2cc/0x2e0
__sys_sendmsg+0x5c/0xc0
SyS_socketcall+0x36c/0x3f0
system_call+0x3c/0x100
as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, dynamically
allocate memory for crash memory ranges and resize it incrementally,
in units of pagesize, on hitting array size limit.
Fixes: 2df173d9e8 ("fadump: Initialize elfcore header and add PT_LOAD program headers.")
Cc: stable@vger.kernel.org # v3.4+
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Reviewed-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
[mpe: Just use PAGE_SIZE directly, fixup variable placement]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>