linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-13 07:31:45 +00:00

History

Linus Torvalds 9244724fbf A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSZb/YTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoRoOD/9vAiGI3IhGyZcX/RjXxauSHf8Pmqll 05jUubFi5Vi3tKI1ubMOsnMmJTw2yy5xDyS/iGj7AcbRLq9uQd3iMtsXXHNBzo/X FNxnuWTXYUj0vcOYJ+j4puBumFzzpRCprqccMInH0kUnSWzbnaQCeelicZORAf+w zUYrswK4HpBXHDOnvPw6Z7MYQe+zyDQSwjSftstLyROzu+lCEw/9KUaysY2epShJ wHClxS2XqMnpY4rJ/CmJAlRhD0Plb89zXyo6k9YZYVDWoAcmBZy6vaTO4qoR171L 37ApqrgsksMkjFycCMnmrFIlkeb7bkrYDQ5y+xqC3JPTlYDKOYmITV5fZ83HD77o K7FAhl/CgkPq2Ec+d82GFLVBKR1rijbwHf7a0nhfUy0yMeaJCxGp4uQ45uQ09asi a/VG2T38EgxVdseC92HRhcdd3pipwCb5wqjCH/XdhdlQrk9NfeIeP+TxF4QhADhg dApp3ifhHSnuEul7+HNUkC6U+Zc8UeDPdu5lvxSTp2ooQ0JwaGgC5PJq3nI9RUi2 Vv826NHOknEjFInOQcwvp6SJPfcuSTF75Yx6xKz8EZ3HHxpvlolxZLq+3ohSfOKn 2efOuZO5bEu4S/G2tRDYcy+CBvNVSrtZmCVqSOS039c8quBWQV7cj0334cjzf+5T TRiSzvssbYYmaw== =Y8if -----END PGP SIGNATURE----- Merge tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP updates from Thomas Gleixner: "A large update for SMP management: - Parallel CPU bringup The reason why people are interested in parallel bringup is to shorten the (kexec) reboot time of cloud servers to reduce the downtime of the VM tenants. The current fully serialized bringup does the following per AP: 1) Prepare callbacks (allocate, intialize, create threads) 2) Kick the AP alive (e.g. INIT/SIPI on x86) 3) Wait for the AP to report alive state 4) Let the AP continue through the atomic bringup 5) Let the AP run the threaded bringup to full online state There are two significant delays: #3 The time for an AP to report alive state in start_secondary() on x86 has been measured in the range between 350us and 3.5ms depending on vendor and CPU type, BIOS microcode size etc. #4 The atomic bringup does the microcode update. This has been measured to take up to ~8ms on the primary threads depending on the microcode patch size to apply. On a two socket SKL server with 56 cores (112 threads) the boot CPU spends on current mainline about 800ms busy waiting for the APs to come up and apply microcode. That's more than 80% of the actual onlining procedure. This can be reduced significantly by splitting the bringup mechanism into two parts: 1) Run the prepare callbacks and kick the AP alive for each AP which needs to be brought up. The APs wake up, do their firmware initialization and run the low level kernel startup code including microcode loading in parallel up to the first synchronization point. (#1 and #2 above) 2) Run the rest of the bringup code strictly serialized per CPU (#3 - #5 above) as it's done today. Parallelizing that stage of the CPU bringup might be possible in theory, but it's questionable whether required surgery would be justified for a pretty small gain. If the system is large enough the first AP is already waiting at the first synchronization point when the boot CPU finished the wake-up of the last AP. That reduces the AP bringup time on that SKL from ~800ms to ~80ms, i.e. by a factor ~10x. The actual gain varies wildly depending on the system, CPU, microcode patch size and other factors. There are some opportunities to reduce the overhead further, but that needs some deep surgery in the x86 CPU bringup code. For now this is only enabled on x86, but the core functionality obviously works for all SMP capable architectures. - Enhancements for SMP function call tracing so it is possible to locate the scheduling and the actual execution points. That allows to measure IPI delivery time precisely" * tag 'smp-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) trace,smp: Add tracepoints for scheduling remotelly called functions trace,smp: Add tracepoints around remotelly called functions MAINTAINERS: Add CPU HOTPLUG entry x86/smpboot: Fix the parallel bringup decision x86/realmode: Make stack lock work in trampoline_compat() x86/smp: Initialize cpu_primary_thread_mask late cpu/hotplug: Fix off by one in cpuhp_bringup_mask() x86/apic: Fix use of X{,2}APIC_ENABLE in asm with older binutils x86/smpboot/64: Implement arch_cpuhp_init_parallel_bringup() and enable it x86/smpboot: Support parallel startup of secondary CPUs x86/smpboot: Implement a bit spinlock to protect the realmode stack x86/apic: Save the APIC virtual base address cpu/hotplug: Allow "parallel" bringup up to CPUHP_BP_KICK_AP_STATE x86/apic: Provide cpu_primary_thread mask x86/smpboot: Enable split CPU startup cpu/hotplug: Provide a split up CPUHP_BRINGUP mechanism cpu/hotplug: Reset task stack state in _cpu_up() cpu/hotplug: Remove unused state functions riscv: Switch to hotplug core state synchronization parisc: Switch to hotplug core state synchronization ...		2023-06-26 13:59:56 -07:00
..
ABI	dmaengine updates for v6.4	2023-05-03 11:11:56 -07:00
accel	accel/qaic: Add documentation for AIC100 accelerator driver	2023-04-06 08:23:03 +02:00
accounting	Scheduler changes for v6.4:	2023-04-28 14:53:30 -07:00
admin-guide	A large update for SMP management:	2023-06-26 13:59:56 -07:00
arch	A handful of late-arriving documentation fixes, plus one Spanish	2023-05-05 13:16:42 -07:00
arm	ARM: SoC devicetree changes for 6.4	2023-04-25 12:11:54 -07:00
arm64	irqchip/gic-v3: Work around affinity issues on ASR8601	2023-05-29 21:19:34 +01:00
block	Documentation/block: drop the request.rst file	2023-05-12 11:04:58 -06:00
bpf	lsm/stable-6.4 PR 20230428	2023-04-29 10:17:05 -07:00
cdrom	Documentation: use capitalization for chapters and acronyms	2023-05-16 12:49:31 -06:00
core-api	A large update for SMP management:	2023-06-26 13:59:56 -07:00
cpu-freq
crypto
dev-tools	Mainly singleton patches all over the place. Series of note are:	2023-04-27 19:57:00 -07:00
devicetree	Updates for the interrupt subsystem:	2023-06-26 13:34:39 -07:00
doc-guide
driver-api	pwm: Changes for v6.4-rc1	2023-05-03 11:25:01 -07:00
fault-injection	block: null_blk: make fault-injection dynamically configurable per device	2023-04-13 07:38:55 -06:00
fb
features
filesystems	fsverity updates for 6.5	2023-06-26 10:56:13 -07:00
firmware_class
firmware-guide
fpga	Documentation: use capitalization for chapters and acronyms	2023-05-16 12:49:31 -06:00
gpu
hid
hwmon	hwmon: (aquacomputer_d5next) Add support for Aquacomputer Aquastream XT	2023-04-21 07:27:23 -07:00
i2c
iio
images
infiniband
input
isdn
kbuild	parisc: update kbuild doc. aliases for parisc64	2023-05-03 17:43:10 +02:00
kernel-hacking	Documentation: Add document for false sharing	2023-04-10 16:46:11 -06:00
leds	- New Drivers	2023-05-02 10:36:02 -07:00
litmus-tests	LKMM scripting updates for v6.4	2023-04-24 12:02:25 -07:00
livepatch	Objtool changes for v6.4:	2023-04-28 14:02:54 -07:00
locking	Documentation: use capitalization for chapters and acronyms	2023-05-16 12:49:31 -06:00
loongarch
maintainer
mhi
mips
misc-devices
mm	mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM	2023-05-29 16:14:28 +01:00
netlabel
netlink	netlink: specs: ethtool: fix random typos	2023-06-06 18:42:20 -07:00
networking	net/ipv4: ping_group_range: allow GID from 2147483648 to 4294967294	2023-06-02 09:55:22 +01:00
nvdimm
nvme
PCI
pcmcia	Documentation: use capitalization for chapters and acronyms	2023-05-16 12:49:31 -06:00
peci
power	regulator: consumer.rst: fix 'regulator_enable' typo.	2023-04-27 21:55:38 +01:00
powerpc
process	rust: upgrade to Rust 1.68.2	2023-05-31 17:35:03 +02:00
RCU	doc: Update whatisRCU.rst	2023-04-05 13:47:18 +00:00
riscv	Documentation: RISC-V: patch-acceptance: mention patchwork's role	2023-06-14 07:44:11 -07:00
rust	docs: rust: point directly to the standalone installers	2023-05-31 18:52:35 +02:00
s390	s390/iommu: get rid of S390_CCW_IOMMU and S390_AP_IOMMU	2023-05-17 15:20:18 +02:00
scheduler	sh updates for v6.4	2023-04-27 17:41:23 -07:00
scsi
security	lsm: move hook comments docs to security/security.c	2023-04-28 11:58:34 -04:00
sound	ALSA: docs: Fix code block indentation in ALSA driver example	2023-05-03 08:08:25 +02:00
sphinx
sphinx-static
spi
staging	Documentation: use capitalization for chapters and acronyms	2023-05-16 12:49:31 -06:00
target
timers	Documentation: use capitalization for chapters and acronyms	2023-05-16 12:49:31 -06:00
tools	rtla/timerlat: Add auto-analysis only option	2023-04-25 19:26:17 -04:00
trace	tracing/user_events: Document auto-cleanup and remove dyn_event refs	2023-06-14 13:43:27 -04:00
translations	docs: zh_CN/devicetree: sync usage-model fix	2023-06-08 07:31:59 -06:00
usb
userspace-api	cifs: correct references in Documentation to old fs/cifs path	2023-05-24 16:29:21 -05:00
virt	VFIO updates for v6.4-rc1	2023-05-02 11:56:43 -07:00
w1
watchdog
.gitignore
atomic_bitops.txt
atomic_t.txt
Changes
CodingStyle
conf.py	docs: turn off "smart quotes" in the HTML build	2023-04-20 17:53:18 -06:00
docutils.conf
dontdiff
index.rst
Kconfig
Makefile
memory-barriers.txt
SubmittingPatches
subsystem-apis.rst