A mirror of the official Linux kernel repository just in case
Go to file
Douglas Anderson 6dcde5d5f2 watchdog/hardlockup: adopt softlockup logic avoiding double-dumps
Patch series "watchdog: Better handling of concurrent lockups".

When we get multiple lockups at roughly the same time, the output in the
kernel logs can be very confusing since the reports about the lockups end
up interleaved in the logs.  There is some code in the kernel to try to
handle this but it wasn't that complete.

Li Zhe recently made this a bit better for softlockups (specifically for
the case where `kernel.softlockup_all_cpu_backtrace` is not set) in commit
9d02330abd ("softlockup: serialized softlockup's log"), but that only
handled softlockup reports.  Hardlockup reports still had similar issues.

This series also has a small fix to avoid dumping all stacks a second time
in the case of a panic.  This is a bit unrelated to the interleaving fixes
but it does also improve the clarity of lockup reports.


This patch (of 4):

The hardlockup detector and softlockup detector both have the ability to
dump the stack of all CPUs (`kernel.hardlockup_all_cpu_backtrace` and
`kernel.softlockup_all_cpu_backtrace`).  Both detectors also have some
logic to attempt to avoid interleaving printouts if two CPUs were trying
to do dumps of all CPUs at the same time.  However:

- The hardlockup detector's logic still allowed interleaving some
  information. Specifically another CPU could print modules and dump
  the stack of the locked CPU at the same time we were dumping all
  CPUs.

- In the case where `kernel.hardlockup_panic` was set in addition to
  `kernel.hardlockup_all_cpu_backtrace`, when two CPUs both detected
  hardlockups at the same time the second CPU could call panic() while
  the first was still dumping stacks. This was especially bad if the
  locked up CPU wasn't responding to the request for a backtrace since
  the function nmi_trigger_cpumask_backtrace() can wait up to 10
  seconds.

Let's resolve this by adopting the softlockup logic in the hardlockup
handler.

NOTES:

- As part of this, one might think that we should make a helper
  function that both the hard and softlockup detectors call. This
  turns out not to be super trivial since it would have to be
  parameterized quite a bit since there are separate global variables
  controlling each lockup detector and they print log messages that
  are just different enough that it would be a pain. We probably don't
  want to change the messages that are printed without good reason to
  avoid throwing log parsers for a loop.

- One might also think that it would be a good idea to have the
  hardlockup and softlockup detector use the same global variable to
  prevent interleaving. This would make sure that softlockups and
  hardlockups can't interleave each other. That _almost_ works but has
  a dangerous flaw if `kernel.hardlockup_panic` is not the same as
  `kernel.softlockup_panic` because we might skip a call to panic() if
  one type of lockup was detected at the same time as another.

Link: https://lkml.kernel.org/r/20231220211640.2023645-1-dianders@chromium.org
Link: https://lkml.kernel.org/r/20231220131534.1.I4f35a69fbb124b5f0c71f75c631e11fabbe188ff@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Li Zhe <lizhe.67@bytedance.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-12-29 12:22:30 -08:00
arch x86/kexec: fix incorrect end address passed to kernel_ident_mapping_init() 2023-12-29 12:22:29 -08:00
block block-6.7-2023-12-01 2023-12-02 06:39:30 +09:00
certs This update includes the following changes: 2023-11-02 16:15:30 -10:00
crypto This push fixes a regression in ahash and hides the Kconfig sub-options for the jitter RNG. 2023-11-09 17:04:58 -08:00
Documentation docs: submit-checklist: remove all of "make namespacecheck" 2023-12-29 12:22:27 -08:00
drivers lib: crc_ccitt_false() is identical to crc_itu_t() 2023-12-29 12:22:26 -08:00
fs nilfs2: cpfile: fix some kernel-doc warnings 2023-12-29 12:22:29 -08:00
include lib: crc_ccitt_false() is identical to crc_itu_t() 2023-12-29 12:22:26 -08:00
init init/Kconfig: move more items into the EXPERT menu 2023-12-20 15:02:58 -08:00
io_uring io_uring: use fget/fput consistently 2023-11-28 11:56:29 -07:00
ipc Many singleton patches against the MM code. The patch series which are 2023-11-02 19:38:47 -10:00
kernel watchdog/hardlockup: adopt softlockup logic avoiding double-dumps 2023-12-29 12:22:30 -08:00
lib lib/trace_readwrite.c:: replace asm-generic/io with linux/io 2023-12-29 12:22:29 -08:00
LICENSES LICENSES: Add the copyleft-next-0.3.1 license 2022-11-08 15:44:01 +01:00
mm x86/kexec: use pr_err() instead of kexec_dprintk() when an error occurs 2023-12-29 12:22:28 -08:00
net wireless fixes: 2023-11-29 19:43:34 -08:00
rust Kbuild updates for v6.7 2023-11-04 08:07:19 -10:00
samples Landlock updates for v6.7-rc1 2023-11-03 09:28:53 -10:00
scripts scripts/checkstack.pl: fix no space expression between sp and offset 2023-12-29 12:22:28 -08:00
security kexec_file: print out debugging message if required 2023-12-20 15:02:57 -08:00
sound ALSA: hda: Disable power-save on KONTRON SinglePC 2023-11-30 16:14:21 +01:00
tools Revert "selftests: error out if kernel header files are not yet built" 2023-12-12 17:20:19 -08:00
usr usr/Kconfig: fix typos of "its" 2023-12-20 15:02:58 -08:00
virt ARM: 2023-09-07 13:52:20 -07:00
.clang-format iommu: Add for_each_group_device() 2023-05-23 08:15:51 +02:00
.cocciconfig
.get_maintainer.ignore get_maintainer: add Alan to .get_maintainer.ignore 2022-08-20 15:17:44 -07:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore kbuild: rpm-pkg: generate kernel.spec in rpmbuild/SPECS/ 2023-10-03 20:49:09 +09:00
.mailmap .mailmap: add a new address mapping for Chester Lin 2023-12-06 16:12:45 -08:00
.rustfmt.toml rust: add .rustfmt.toml 2022-09-28 09:02:20 +02:00
COPYING
CREDITS MAINTAINERS: remove Ohad Ben-Cohen from hwspinlock subsystem 2023-12-29 12:22:24 -08:00
Kbuild Kbuild updates for v6.1 2022-10-10 12:00:45 -07:00
Kconfig
MAINTAINERS MAINTAINERS: remove Ohad Ben-Cohen from hwspinlock subsystem 2023-12-29 12:22:24 -08:00
Makefile checkstack: allow to pass MINSTACKSIZE parameter 2023-12-10 17:21:32 -08:00
README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.