A mirror of the official Linux kernel repository just in case
Go to file
Borislav Petkov 9a90ed065a x86/thermal: Fix LVT thermal setup for SMI delivery mode
There are machines out there with added value crap^WBIOS which provide an
SMI handler for the local APIC thermal sensor interrupt. Out of reset,
the BSP on those machines has something like 0x200 in that APIC register
(timestamps left in because this whole issue is timing sensitive):

  [    0.033858] read lvtthmr: 0x330, val: 0x200

which means:

 - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
 - bits [10:8] have 010b which means SMI delivery mode.

Now, later during boot, when the kernel programs the local APIC, it
soft-disables it temporarily through the spurious vector register:

  setup_local_APIC:

  	...

	/*
	 * If this comes from kexec/kcrash the APIC might be enabled in
	 * SPIV. Soft disable it before doing further initialization.
	 */
	value = apic_read(APIC_SPIV);
	value &= ~APIC_SPIV_APIC_ENABLED;
	apic_write(APIC_SPIV, value);

which means (from the SDM):

"10.4.7.2 Local APIC State After It Has Been Software Disabled

...

* The mask bits for all the LVT entries are set. Attempts to reset these
bits will be ignored."

And this happens too:

  [    0.124111] APIC: Switch to symmetric I/O mode setup
  [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
  [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0

This results in CPU 0 soft lockups depending on the placement in time
when the APIC soft-disable happens. Those soft lockups are not 100%
reproducible and the reason for that can only be speculated as no one
tells you what SMM does. Likely, it confuses the SMM code that the APIC
is disabled and the thermal interrupt doesn't doesn't fire at all,
leading to CPU 0 stuck in SMM forever...

Now, before

  4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")

due to how the APIC_LVTTHMR was read before APIC initialization in
mcheck_intel_therm_init(), it would read the value with the mask bit 16
clear and then intel_init_thermal() would replicate it onto the APs and
all would be peachy - the thermal interrupt would remain enabled.

But that commit moved that reading to a later moment in
intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
late and with its interrupt mask bit set.

Thus, revert back to the old behavior of reading the thermal LVT
register before the APIC gets initialized.

Fixes: 4f432e8bb1 ("x86/mce: Get rid of mcheck_intel_therm_init()")
Reported-by: James Feeney <james@nurealm.net>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
2021-05-31 22:32:26 +02:00
arch x86/thermal: Fix LVT thermal setup for SMI delivery mode 2021-05-31 22:32:26 +02:00
block block-5.13-2021-05-22 2021-05-22 07:40:34 -10:00
certs Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
crypto for-5.13/drivers-2021-04-27 2021-04-28 14:39:37 -07:00
Documentation powerpc fixes for 5.13 #4 2021-05-23 06:07:33 -10:00
drivers x86/thermal: Fix LVT thermal setup for SMI delivery mode 2021-05-31 22:32:26 +02:00
fs userfaultfd: hugetlbfs: fix new flag usage in error path 2021-05-22 15:09:07 -10:00
include linux/bits.h: fix compilation error with GENMASK 2021-05-22 15:09:07 -10:00
init Merge branch 'akpm' (patches from Andrew) 2021-05-07 00:34:51 -07:00
ipc ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry 2021-05-22 15:09:07 -10:00
kernel Two locking fixes: 2021-05-23 06:30:08 -10:00
lib lib: kunit: suppress a compilation warning of frame size 2021-05-22 15:09:07 -10:00
LICENSES LICENSES: Add the CC-BY-4.0 license 2020-12-08 10:33:27 -07:00
mm userfaultfd: hugetlbfs: fix new flag usage in error path 2021-05-22 15:09:07 -10:00
net Char/misc driver fixes for 5.13-rc3 2021-05-20 06:31:52 -10:00
samples Kbuild updates for v5.13 (2nd) 2021-05-08 10:00:11 -07:00
scripts kbuild: dummy-tools: adjust to stricter stackprotector check 2021-05-17 12:10:03 +09:00
security trusted-keys: match tpm_get_ops on all return paths 2021-05-12 22:36:37 +03:00
sound sound fixes for 5.13-rc3 2021-05-20 06:42:21 -10:00
tools powerpc fixes for 5.13 #4 2021-05-23 06:07:33 -10:00
usr .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
virt kvm: Cap halt polling at kvm->max_halt_poll_ns 2021-05-07 06:06:22 -04:00
.clang-format cxl for 5.12 2021-02-24 09:38:36 -08:00
.cocciconfig
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: ignore only top-level modules.builtin 2021-05-02 00:43:35 +09:00
.mailmap Merge drm/drm-fixes into drm-misc-fixes 2021-05-11 13:35:52 +02:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: move Murali Karicheri to credits 2021-04-29 15:47:30 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS proc: remove Alexey from MAINTAINERS 2021-05-22 15:09:07 -10:00
Makefile Linux 5.13-rc3 2021-05-23 11:42:48 -10:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.