mirror of
https://github.com/torvalds/linux.git
synced 2024-12-29 06:12:08 +00:00
a771ea6413
39425 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
a771ea6413 |
Power management updates for 5.20-rc1
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar). - Drop unnecessary CPU hotplug locking from store() used by cpufreq sysfs attributes (Viresh Kumar). - Make the ACPI cpufreq driver support the boost control interface on Zhaoxin/Centaur processors (Tony W Wang-oc). - Print a warning message on attempts to free an active cpufreq policy which should never happen (Viresh Kumar). - Fix grammar in the Kconfig help text for the loongson2 cpufreq driver (Randy Dunlap). - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq governor (Zhao Liu). - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll cpuidle driver (Eiichi Tsukata). - Modify intel_idle to treat C1 and C1E as independent idle states on Sapphire Rapids (Artem Bityutskiy). - Extend support for wakeirq to callback wrappers used during system suspend and resume (Ulf Hansson). - Defer waiting for device probe before loading a hibernation image till the first actual device access to avoid possible deadlocks reported by syzbot (Tetsuo Handa). - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn Helgaas). - Add Raptor Lake-P to the list of processors supported by the Intel RAPL driver (George D Sworo). - Add Alder Lake-N and Raptor Lake-P to the list of processors for which Power Limit4 is supported in the Intel RAPL driver (Sumeet Pawnikar). - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before attempting to remove it (Hsin-Yi Wang). - Change the Energy Model code to represent power in micro-Watts and adjust its users accordingly (Lukasz Luba). - Add new devfreq driver for Mediatek CCI (Cache Coherent Interconnect) (Johnson Wang). - Convert the Samsung Exynos SoC Bus bindings to DT schema of exynos-bus.c (Krzysztof Kozlowski). - Address kernel-doc warnings by adding the description for unused fucntion parameters in devfreq core (Mauro Carvalho Chehab). - Use NULL to pass a null pointer rather than zero according to the function propotype in imx-bus.c (Colin Ian King). - Print error message instead of error interger value in tegra30-devfreq.c (Dmitry Osipenko). - Add checks to prevent setting negative frequency QoS limits for CPUs (Shivnandan Kumar). - Update the pm-graph suite of utilities to the latest revision 5.9 including multiple improvements (Todd Brandt). - Drop pme_interrupt reference from the PCI power management documentation (Mario Limonciello). -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLoKy8SHHJqd0Byand5 c29ja2kubmV0AAoJEILEb/54YlRx3+oQAJNVU+W14EaRPWXQRMuwBC5zk3hb6T9q JqmMd8coEd+9/4ABAeRAWso1B26rUzB6JyBvw3lGH9OXInpYmvnJEhEPrTpK2h0D U9HxEARuGJolrDm0X9NAkn7tKKMC9GnvPS9z2s7s+N97VFFWC/QiU+PHB0SypGNb JxRfbVJZQCuxmNG9UeK+xeHFQ9lM2Z9ZdTxR71G0n7nQPPR+sUvnFufFby3Aogf3 XnBYfia+YNqkUlefxxwB5a0cFwPXOUGsQkIf4d64gZnq1TgZ+71kht1GEF08PDFl wV8v1rOWuXEae8dozuf5xszp/eVyAqzgB+IShT9APREOO3Wg6I16XdBm8R1TGwCK JTdZqnm6HVKBNqchEwYViJILX69rrNUT+AwHBWhtKKDNh3qeTuwi/JGTeDVN++en xf3TNKx3LV31Nq6nWJFzDGLehfZMnAPkhfYohUBI7FNyblpk4mJRVcZ0bYI7UNnS als77uoipvb5KdFCtdhxYBHd/y867NvXKa1qsAuDxusAsfJHf4SnlMdbgOepBH2y jJg06CGrMDU3TZ8BL+WpqUYk4irQnAMs/159Txh7A6/dOnTjE7S9NHrENCwmt2og FrHSLH1eLX6Sa4RSibiGHPC7mNULP2/TOtryf3zFdlIVcjm3NEU3bnfzx7nlJn05 8t6ObMxgMhWT =XeLV -----END PGP SIGNATURE----- Merge tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These are mostly minor improvements all over including new CPU IDs for the Intel RAPL driver, an Energy Model rework to use micro-Watt as the power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq updates, documentation cleanups and a new version of the pm-graph suite of utilities. Specifics: - Make cpufreq_show_cpus() more straightforward (Viresh Kumar). - Drop unnecessary CPU hotplug locking from store() used by cpufreq sysfs attributes (Viresh Kumar). - Make the ACPI cpufreq driver support the boost control interface on Zhaoxin/Centaur processors (Tony W Wang-oc). - Print a warning message on attempts to free an active cpufreq policy which should never happen (Viresh Kumar). - Fix grammar in the Kconfig help text for the loongson2 cpufreq driver (Randy Dunlap). - Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq governor (Zhao Liu). - Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll cpuidle driver (Eiichi Tsukata). - Modify intel_idle to treat C1 and C1E as independent idle states on Sapphire Rapids (Artem Bityutskiy). - Extend support for wakeirq to callback wrappers used during system suspend and resume (Ulf Hansson). - Defer waiting for device probe before loading a hibernation image till the first actual device access to avoid possible deadlocks reported by syzbot (Tetsuo Handa). - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn Helgaas). - Add Raptor Lake-P to the list of processors supported by the Intel RAPL driver (George D Sworo). - Add Alder Lake-N and Raptor Lake-P to the list of processors for which Power Limit4 is supported in the Intel RAPL driver (Sumeet Pawnikar). - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before attempting to remove it (Hsin-Yi Wang). - Change the Energy Model code to represent power in micro-Watts and adjust its users accordingly (Lukasz Luba). - Add new devfreq driver for Mediatek CCI (Cache Coherent Interconnect) (Johnson Wang). - Convert the Samsung Exynos SoC Bus bindings to DT schema of exynos-bus.c (Krzysztof Kozlowski). - Address kernel-doc warnings by adding the description for unused function parameters in devfreq core (Mauro Carvalho Chehab). - Use NULL to pass a null pointer rather than zero according to the function propotype in imx-bus.c (Colin Ian King). - Print error message instead of error interger value in tegra30-devfreq.c (Dmitry Osipenko). - Add checks to prevent setting negative frequency QoS limits for CPUs (Shivnandan Kumar). - Update the pm-graph suite of utilities to the latest revision 5.9 including multiple improvements (Todd Brandt). - Drop pme_interrupt reference from the PCI power management documentation (Mario Limonciello)" * tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits) powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P PM: QoS: Add check to make sure CPU freq is non-negative PM: hibernate: defer device probing when resuming from hibernation intel_idle: make SPR C1 and C1E be independent cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask cpufreq: loongson2: fix Kconfig "its" grammar pm-graph v5.9 cpufreq: Warn users while freeing active policy cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1 firmware: arm_scmi: Get detailed power scale from perf Documentation: EM: Switch to micro-Watts scale PM: EM: convert power field to micro-Watts precision and align drivers PM / devfreq: tegra30: Add error message for devm_devfreq_add_device() PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero PM / devfreq: shut up kernel-doc warnings dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver dt-bindings: interconnect: Add MediaTek CCI dt-bindings PM: domains: Ensure genpd_debugfs_dir exists before remove PM: runtime: Extend support for wakeirq for force_suspend|resume ... |
||
Linus Torvalds
|
9de1f9c8ca |
Updates for interrupt core and drivers:
core: - Fix a few inconsistencies between UP and SMP vs. interrupt affinities - Small updates and cleanups all over the place drivers: - New driver for the LoongArch interrupt controller - New driver for the Renesas RZ/G2L interrupt controller - Hotpath optimization for SiFive PLIC - Workaround for broken PLIC edge triggered interrupts - Simall cleanups and improvements as usual -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmLn5agTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoV2HD/4u0+09Fd8Awt1Knnb4CInmwFihZ/bu EiS1Air+MEJ/fyFb5sT/Dn8YdUWYA6a3ifpLMGBwrKCcb5pMwPEtI8uqjSmtgsN/ 2Z4o3N5v6EgM25CtrHNBrXK0E9Rz5Py49gm5p3K7+h4g63z9JwrM4G0Bvr8+krLS EV9IZU6kVmGC6gnG/MspkArsLk1rCM0PU0SJ2lEPsWd1fjhVSDfunvy/qnnzXRzz wjrcAf+a2Kgb1TMnpL6tx9d2Xx8KrKfODZTdOmPHrdv58F0EbJzapJnAVkYZDPtR LE2kQc2Qhdawx0kgNNNhvu9P6oZtpnK9N7KAhDQdw17sgrRygINjAMSEe2RykYL1 lK+lJOIzfyd2JkEuC/8w1ZezL88S0EaTNawrkxjJ8L3fa7WDbwilCC+1w95QydCv sQB137OaLKgWetcRsht9PLWFb4ujkWzxoPf2cPPsm81EzCicNtBuNPLReBTcNrWJ X2VPpbaqRK8t8bnkXRqhahbq7f8c86feoICHfA4c7T4eZUp/Oq6T8aNvf6WPgjae I2/FO6kxZj3CQqFkhJGhiZRtGZdx6HLCsL84A+2Ktsra+D8+/qecZCnkHYtz0Vo6 aFuGg+Wj+zuc2QfdaWwG8Dh5dijbxgHGHhzbh9znsWzytN9gfoBxuvBejf65i6sC In63mEkv35ttfA== =OnhF -----END PGP SIGNATURE----- Merge tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "Updates for interrupt core and drivers: Core: - Fix a few inconsistencies between UP and SMP vs interrupt affinities - Small updates and cleanups all over the place New drivers: - LoongArch interrupt controller - Renesas RZ/G2L interrupt controller Updates: - Hotpath optimization for SiFive PLIC - Workaround for broken PLIC edge triggered interrupts - Simall cleanups and improvements as usual" * tag 'irq-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) irqchip/mmp: Declare init functions in common header file irqchip/mips-gic: Check the return value of ioremap() in gic_of_init() genirq: Use for_each_action_of_desc in actions_show() irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch irqchip: Add LoongArch CPU interrupt controller support irqchip: Add Loongson Extended I/O interrupt controller support irqchip/loongson-liointc: Add ACPI init support irqchip/loongson-pch-msi: Add ACPI init support irqchip/loongson-pch-pic: Add ACPI init support irqchip: Add Loongson PCH LPC controller support LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain LoongArch: Use ACPI_GENERIC_GSI for gsi handling genirq/generic_chip: Export irq_unmap_generic_chip ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback APCI: irq: Add support for multiple GSI domains LoongArch: Provisionally add ACPICA data structures irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains irqdomain: Report irq number for NOMAP domains irqchip/gic-v3: Fix comment typo dt-bindings: interrupt-controller: renesas,rzg2l-irqc: Document RZ/V2L SoC ... |
||
Linus Torvalds
|
63e6053add |
Perf events updates for this cycle are:
- Fix Intel Alder Lake PEBS memory access latency & data source profiling info bugs. - Use Intel large-PEBS hardware feature in more circumstances, to reduce PMI overhead & reduce sampling data. - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI variant, which tells tooling the exact number of samples lost. - Add new IBS register bits definitions. - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLn5MARHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1jAWA/+N48UX35dD0u3k5S2zdYJRHzQkdbivVGc dOuCB3XTJYneaSI5byQkI4Xo8LUuMbF4q2Zi3/+XhTaqn2zYPP65D6ACL5hU9Shh F95TnLWbedIaxSJmjMCsWDlwBob8WgtLhokWvyq+ks66BqaDoBKHRtn+2fi0rwZb MbuN0199Gx/EicWEOeUGBSxoeKbjSix0BApqy+CuXC0DC3+3iwIPk4dbNfHXpHYs nqxjQKhJnoxdlgjiOY3UuYhdCZl1cuQFIu2Ce1N2nXCAgR2FeQD7ZqtcaA2TnsAO 9BwRfLljavzHhOoz0zALR42kF+eOcnH5K9pIBx7py9Hjdmdsx88fUCovWK34MdG5 KTuqiMWNLIUvoP9WBjl7wUtl2+vcjr9XwgCdneOO+zoNsk44qSRyer1RpEP6D9UM e9HvdXBVRzhnIhK9NYugeLJ+3nxvFL+OLvc3ZfUrtm04UzeetCBxMlvMv3y021V7 0fInZjhzh4Dz2tJgNlG7AKXkXlsHlyj6/BH9uKc9wVokK+94g5mbspxW8R4gKPr2 l06pYB7ttSpp26sq9vl5ASHO0rniiYAPsQcr7Ko3y72mmp6kfIe/HzYNhCEvgYe2 6JJ8F9kPgRuKr0CwGvUzxFwBC7PJR80zUtZkRCIpV+rgxQcNmK5YXp/KQFIjQqkI rJfEaDOshl0= =DqaA -----END PGP SIGNATURE----- Merge tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events updates from Ingo Molnar: - Fix Intel Alder Lake PEBS memory access latency & data source profiling info bugs. - Use Intel large-PEBS hardware feature in more circumstances, to reduce PMI overhead & reduce sampling data. - Extend the lost-sample profiling output with the PERF_FORMAT_LOST ABI variant, which tells tooling the exact number of samples lost. - Add new IBS register bits definitions. - AMD uncore events: Add PerfMonV2 DF (Data Fabric) enhancements. * tag 'perf-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/ibs: Add new IBS register bits into header perf/x86/intel: Fix PEBS data source encoding for ADL perf/x86/intel: Fix PEBS memory access info encoding for ADL perf/core: Add a new read format to get a number of lost samples perf/x86/amd/uncore: Add PerfMonV2 RDPMC assignments perf/x86/amd/uncore: Add PerfMonV2 DF event format perf/x86/amd/uncore: Detect available DF counters perf/x86/amd/uncore: Use attr_update for format attributes perf/x86/amd/uncore: Use dynamic events array x86/events/intel/ds: Enable large PEBS for PERF_SAMPLE_WEIGHT_TYPE |
||
Linus Torvalds
|
22a39c3d86 |
This was a fairly quiet cycle for the locking subsystem:
- lockdep: Fix a handful of the more complex lockdep_init_map_*() primitives that can lose the lock_type & cause false reports. No such mishap was observed in the wild. - jump_label improvements: simplify the cross-arch support of initial NOP patching by making it arch-specific code (used on MIPS only), and remove the s390 initial NOP patching that was superfluous. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLn3jERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hzeg/7BTC90XeMANhTiL23iiH7dOYZwqdFeB12 VBqdaPaGC8i+mJzVAdGyPFwCFDww6Ak6P33PcHkemuIO5+DhWis8hfw5krHEOO1k AyVSMOZuWJ8/g6ZenjgNFozQ8C+3NqURrpdqN55d7jhMazPWbsNLLqUgvSSqo6DY Ah2O+EKrDfGNCxT6/YaTAmUryctotxafSyFDQxv3RKPfCoIIVv9b3WApYqTOqFIu VYTPr+aAcMsU20hPMWQI4kbQaoCxFqr3bZiZtAiS/IEunqi+PlLuWjrnCUpLwVTC +jOCkNJHt682FPKTWelUnCnkOg9KhHRujRst5mi1+2tWAOEvKltxfe05UpsZYC3b jhzddREMwBt3iYsRn65LxxsN4AMK/C/41zjejHjZpf+Q5kwDsc6Ag3L5VifRFURS KRwAy9ejoVYwnL7CaVHM2zZtOk4YNxPeXmiwoMJmOufpdmD1LoYbNUbpSDf+goIZ yPJpxFI5UN8gi8IRo3DMe4K2nqcFBC3wFn8tNSAu+44gqDwGJAJL6MsLpkLSZkk8 3QN9O11UCRTJDkURjoEWPgRRuIu9HZ4GKNhiblDy6gNM/jDE/m5OG4OYfiMhojgc KlMhsPzypSpeApL55lvZ+AzxH8mtwuUGwm8lnIdZ2kIse1iMwapxdWXWq9wQr8eW jLWHgyZ6rcg= =4B89 -----END PGP SIGNATURE----- Merge tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "This was a fairly quiet cycle for the locking subsystem: - lockdep: Fix a handful of the more complex lockdep_init_map_*() primitives that can lose the lock_type & cause false reports. No such mishap was observed in the wild. - jump_label improvements: simplify the cross-arch support of initial NOP patching by making it arch-specific code (used on MIPS only), and remove the s390 initial NOP patching that was superfluous" * tag 'locking-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix lockdep_init_map_*() confusion jump_label: make initial NOP patching the special case jump_label: mips: move module NOP patching into arch code jump_label: s390: avoid pointless initial NOP patching |
||
Linus Torvalds
|
b167fdffe9 |
This cycle's scheduler updates for v6.0 are:
Load-balancing improvements: ============================ - Improve NUMA balancing on AMD Zen systems for affine workloads. - Improve the handling of reduced-capacity CPUs in load-balancing. - Energy Model improvements: fix & refine all the energy fairness metrics (PELT), and remove the conservative threshold requiring 6% energy savings to migrate a task. Doing this improves power efficiency for most workloads, and also increases the reliability of energy-efficiency scheduling. - Optimize/tweak select_idle_cpu() to spend (much) less time searching for an idle CPU on overloaded systems. There's reports of several milliseconds spent there on large systems with large workloads ... [ Since the search logic changed, there might be behavioral side effects. ] - Improve NUMA imbalance behavior. On certain systems with spare capacity, initial placement of tasks is non-deterministic, and such an artificial placement imbalance can persist for a long time, hurting (and sometimes helping) performance. The fix is to make fork-time task placement consistent with runtime NUMA balancing placement. Note that some performance regressions were reported against this, caused by workloads that are not memory bandwith limited, which benefit from the artificial locality of the placement bug(s). Mel Gorman's conclusion, with which we concur, was that consistency is better than random workload benefits from non-deterministic bugs: "Given there is no crystal ball and it's a tradeoff, I think it's better to be consistent and use similar logic at both fork time and runtime even if it doesn't have universal benefit." - Improve core scheduling by fixing a bug in sched_core_update_cookie() that caused unnecessary forced idling. - Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs for newly woken tasks. - Fix a newidle balancing bug that introduced unnecessary wakeup latencies. ABI improvements/fixes: ======================= - Do not check capabilities and do not issue capability check denial messages when a scheduler syscall doesn't require privileges. (Such as increasing niceness.) - Add forced-idle accounting to cgroups too. - Fix/improve the RSEQ ABI to not just silently accept unknown flags. (No existing tooling is known to have learned to rely on the previous behavior.) - Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags. Optimizations: ============== - Optimize & simplify leaf_cfs_rq_list() - Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg(). Misc fixes & cleanups: ====================== - Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems. - Fix a full-NOHZ bug that can in some cases result in the tick not being re-enabled when the last SCHED_RT task is gone from a runqueue but there's still SCHED_OTHER tasks around. - Various PREEMPT_RT related fixes. - Misc cleanups & smaller fixes. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmLn2ywRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1iNfxAAhPJMwM4tYCpIM6PhmxKiHl6kkiT2tt42 HhEmiJVLjczLybWaWwmGA2dSFkv1f4+hG7nqdZTm9QYn0Pqat2UTSRcwoKQc+gpB x85Hwt2IUmnUman52fRl5r1miH9LTdCI6agWaFLQae5ds1XmOugFo52t2ahax+Gn dB8LxS2fa/GrKj229EhkJSPWAK4Y94asoTProwpKLuKEeXhDkqUNrOWbKhz+wEnA pVZySpA9uEOdNLVSr1s0VB6mZoh5/z6yQefj5YSNntsG71XWo9jxKCIm5buVdk2U wjdn6UzoTThOy/5Ygm64eYRexMHG71UamF1JYUdmvDeUJZ5fhG6RD0FECUQNVcJB Msu2fce6u1AV0giZGYtiooLGSawB/+e6MoDkjTl8guFHi/peve9CezKX1ZgDWPfE eGn+EbYkUS9RMafXCKuEUBAC1UUqAavGN9sGGN1ufyR4za6ogZplOqAFKtTRTGnT /Ne3fHTtvv73DLGW9ohO5vSS2Rp7zhAhB6FunhibhxCWlt7W6hA4Ze2vU9hf78Yn SJDLAJjOEilLaKUkRG/d9uM3FjKJM1tqxuT76+sUbM0MNxdyiKcviQlP1b8oq5Um xE1KNZUevnr/WXqOTGDKHH/HNPFgwxbwavMiP7dNFn8h/hEk4t9dkf5siDmVHtn4 nzDVOob1LgE= =xr2b -----END PGP SIGNATURE----- Merge tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Load-balancing improvements: - Improve NUMA balancing on AMD Zen systems for affine workloads. - Improve the handling of reduced-capacity CPUs in load-balancing. - Energy Model improvements: fix & refine all the energy fairness metrics (PELT), and remove the conservative threshold requiring 6% energy savings to migrate a task. Doing this improves power efficiency for most workloads, and also increases the reliability of energy-efficiency scheduling. - Optimize/tweak select_idle_cpu() to spend (much) less time searching for an idle CPU on overloaded systems. There's reports of several milliseconds spent there on large systems with large workloads ... [ Since the search logic changed, there might be behavioral side effects. ] - Improve NUMA imbalance behavior. On certain systems with spare capacity, initial placement of tasks is non-deterministic, and such an artificial placement imbalance can persist for a long time, hurting (and sometimes helping) performance. The fix is to make fork-time task placement consistent with runtime NUMA balancing placement. Note that some performance regressions were reported against this, caused by workloads that are not memory bandwith limited, which benefit from the artificial locality of the placement bug(s). Mel Gorman's conclusion, with which we concur, was that consistency is better than random workload benefits from non-deterministic bugs: "Given there is no crystal ball and it's a tradeoff, I think it's better to be consistent and use similar logic at both fork time and runtime even if it doesn't have universal benefit." - Improve core scheduling by fixing a bug in sched_core_update_cookie() that caused unnecessary forced idling. - Improve wakeup-balancing by allowing same-LLC wakeup of idle CPUs for newly woken tasks. - Fix a newidle balancing bug that introduced unnecessary wakeup latencies. ABI improvements/fixes: - Do not check capabilities and do not issue capability check denial messages when a scheduler syscall doesn't require privileges. (Such as increasing niceness.) - Add forced-idle accounting to cgroups too. - Fix/improve the RSEQ ABI to not just silently accept unknown flags. (No existing tooling is known to have learned to rely on the previous behavior.) - Depreciate the (unused) RSEQ_CS_FLAG_NO_RESTART_ON_* flags. Optimizations: - Optimize & simplify leaf_cfs_rq_list() - Micro-optimize set_nr_{and_not,if}_polling() via try_cmpxchg(). Misc fixes & cleanups: - Fix the RSEQ self-tests on RISC-V and Glibc 2.35 systems. - Fix a full-NOHZ bug that can in some cases result in the tick not being re-enabled when the last SCHED_RT task is gone from a runqueue but there's still SCHED_OTHER tasks around. - Various PREEMPT_RT related fixes. - Misc cleanups & smaller fixes" * tag 'sched-core-2022-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rseq: Kill process when unknown flags are encountered in ABI structures rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags sched/core: Fix the bug that task won't enqueue into core tree when update cookie nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt() sched/core: Always flush pending blk_plug sched/fair: fix case with reduced capacity CPU sched/core: Use try_cmpxchg in set_nr_{and_not,if}_polling sched/core: add forced idle accounting for cgroups sched/fair: Remove the energy margin in feec() sched/fair: Remove task_util from effective utilization in feec() sched/fair: Use the same cpumask per-PD throughout find_energy_efficient_cpu() sched/fair: Rename select_idle_mask to select_rq_mask sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util() sched/fair: Decay task PELT values during wakeup migration sched/fair: Provide u64 read for 32-bits arch helper sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg sched: only perform capability check on privileged operation sched: Remove unused function group_first_cpu() sched/fair: Remove redundant word " *" selftests/rseq: check if libc rseq support is registered ... |
||
Mathieu Desnoyers
|
c17a6ff932 |
rseq: Kill process when unknown flags are encountered in ABI structures
rseq_abi()->flags and rseq_abi()->rseq_cs->flags 29 upper bits are currently unused. The current behavior when those bits are set is to ignore them. This is not an ideal behavior, because when future features will start using those flags, if user-space fails to correctly validate that the kernel indeed supports those flags (e.g. with a new sys_rseq flags bit) before using them, it may incorrectly assume that the kernel will handle those flags way when in fact those will be silently ignored on older kernels. Validating that unused flags bits are cleared will allow a smoother transition when those flags will start to be used by allowing applications to fail early, and obviously, when they attempt to use the new flags on an older kernel that does not support them. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20220622194617.1155957-2-mathieu.desnoyers@efficios.com |
||
Mathieu Desnoyers
|
0190e4198e |
rseq: Deprecate RSEQ_CS_FLAG_NO_RESTART_ON_* flags
The pretty much unused RSEQ_CS_FLAG_NO_RESTART_ON_* flags introduce complexity in rseq, and are subtly buggy [1]. Solving those issues requires introducing additional complexity in the rseq implementation for each supported architecture. Considering that it complexifies the rseq ABI, I am proposing that we deprecate those flags. [2] So far there appears to be consensus from maintainers of user-space projects impacted by this feature that its removal would be a welcome simplification. [3] The deprecation approach proposed here is to issue WARN_ON_ONCE() when encountering those flags and kill the offending process with sigsegv. This should allow us to quickly identify whether anyone yells at us for removing this. Link: https://lore.kernel.org/lkml/20220618182515.95831-1-minhquangbui99@gmail.com/ [1] Link: https://lore.kernel.org/lkml/258546133.12151.1655739550814.JavaMail.zimbra@efficios.com/ [2] Link: https://lore.kernel.org/lkml/87pmj1enjh.fsf@email.froward.int.ebiederm.org/ [3] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/lkml/20220622194617.1155957-1-mathieu.desnoyers@efficios.com |
||
Linus Torvalds
|
89caf57540 |
- Update the mitigations= kernel param documentation
- Check the IBPB feature flag before enabling IBPB in firmware calls because cloud vendors' fantasy when it comes to creating guest configurations is unlimited - Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV doesn't need it anymore - Remove dead CONFIG_* items -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLmVtEACgkQEsHwGGHe VUoPnBAApfqJMYSnevjBqhiO7W/8s1GDkbvzZD/qHwQKIiTSNZWmB1QGaBJLmPWr 6UvsFq3ElxFkg7rovHKYV197cHZlldWNt6BC2mDUESAHZb8HMw38e0IUcxbOJHZq DnLVxcek3VkDG8THGSoY+NX3lvcvTx+w5C7o2SZnjBxhBYMBEXWP14UvoVAWV+HT /vEcHi3jkYiNwyTtQFdszIxF5u5qMo2qV24hiTZDYFHBBsEGTRxVRgo4kHBQlQ/t 3AxrW01Ut4zunqKlXG0wXncF1aSgfsb7XplR9bqfWz9eQzFHkZ0DqqfoCXQZRQZo nYQQT/A/hY2rm/HFBZ329hDm6fnu+u/8FzaBGm3DUp9UWGLqxFcCqH+QtKmpJXhr wTK/7mB2Baw0lhc110LhDLLFydI8smQwfPf8B9IzR3Ij7j9OYqO8+NFwNR+tMk+J VWl5aFafzVEQcf7gBGVsu/sRkxc05VtEohOV25J9VHDzlaBCMCvCpoGKfwntpp0h 9xaWUNE9/P1ggbRcxUHVmdnDnoNn087hqUBOO7GOX/cnFvADMjL3h0GqvZinj/wI 8BbpTxAU8i5qodJcsnnzxtzekxzKk6KhcHo/sMULyVSAeDnTfaPIkyfE3b6Pxiam U1QFTWPqV9371u26dnF0bYsg+UEJasuuth8noybVwej+MJvapts= =fEYI -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Update the 'mitigations=' kernel param documentation - Check the IBPB feature flag before enabling IBPB in firmware calls because cloud vendors' fantasy when it comes to creating guest configurations is unlimited - Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV doesn't need it anymore - Remove dead CONFIG_* items * tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV" x86/configs: Update configs in x86_debug.config |
||
Linus Torvalds
|
5e4823e6da |
- Avoid rwsem lockups in certain situations when handling the handoff bit
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLmUPkACgkQEsHwGGHe VUqgow/+Oj8acqImjR1OGW0MGW5F4OBRxPlWYGRBem0PwtysKSOUEuLKFGrfUPP8 9/o/WDK7sKm0A0Ph4++zyuxQVUdww1kWR1BaOzBBJZMhB3dYk511JW2EZc7TPQg8 qnBWOh1WGztaIATImo1JtN7GVlz6mWEq5i7CkyYWOfqqgMMfzS5N548KtFs37k1F GPwR2fntThsgYlL7+5ekHVBabx3Lf5CvpUkct484LtIrvO9xvBr+R5fzxdkd/j7s xGVFpt0sMEGjnOatLP+Q41E6n4Vugzjk9FdxOAYLcSl8NPGj/7HUtXB0oLcU7jSn eFxr2vurueVxpueNieBKJNiSicFsgx+QNsEtERtzLfyosgKtDkWtl5cP6k7qzqVm 9KGAWc5tiQJ5DcIoxf+pKBEXBnf6EKFS7PrknYFTbWPFnbun0nw4OnFLufUgeg9c qB6afbWUOwKLWYIcJZadmnvmE2ZhaPAv1KPvqeE7E8ln5ERbg2UKY4qV37bvyJFg N+gVv+acSip4KtGswGUBKFriJ/vvN1dh/PiBqqJC3AHwlz+CxYsOVgpk9tkhlaQ9 1HsQ51hyN/pb688J9SshqZf2BH3qS6Kz4eLa1eXGPEywsRBJfg4lufncn1JbrCg8 CzkUfVPbS31LahMDs5U3IWGSiYSUsy1JDRLZ2zns9ZEMaaZWPKQ= =SBw2 -----END PGP SIGNATURE----- Merge tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Avoid rwsem lockups in certain situations when handling the handoff bit * tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter |
||
Waiman Long
|
6eebd5fb20 |
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
With commit |
||
Linus Torvalds
|
4b20426d04 |
wq fixes for v5.19-rc8
Just one commit to suppress a spurious warning added during the 5.19 cycle. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCYuQfNg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGdjFAQDAPPlHskr1oC6d2k2nqPNEzEpOq1LWLxRK/hR2 dddxsgD+KV0GMGb43W5Au2lbscze1WNM9jeanpofRoyV+l1gyQA= =hlX7 -----END PGP SIGNATURE----- Merge tag 'wq-for-5.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "Just one commit to suppress a spurious warning added during the 5.19 cycle" * tag 'wq-for-5.19-rc8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Avoid a false warning in unbind_workers() |
||
Lai Jiangshan
|
46a4d679ef |
workqueue: Avoid a false warning in unbind_workers()
Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible
fails and trigger the false warning.
Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs.
It is very easy to trigger the warning:
Set wq_unbound_cpumask to a small set of CPUs.
Offline all the CPUs of wq_unbound_cpumask.
Offline an extra CPU and trigger the warning.
Fixes:
|
||
Rafael J. Wysocki
|
aa727b7b4b |
Merge branches 'pm-devfreq', 'pm-qos', 'pm-tools' and 'pm-docs'
Merge devfreq changes, PM QoS change, and power management tools and documentation changes for v5.20-rc1: - Add new devfreq driver for Mediatek CCI (Cache Coherent Interconnect) (Johnson Wang). - Convert the Samsung Exynos SoC Bus bindings to DT schema of exynos-bus.c (Krzysztof Kozlowski). - Address kernel-doc warnings by adding the description for unused fucntion parameters in devfreq core (Mauro Carvalho Chehab). - Use NULL to pass a null pointer rather than zero according to the function propotype in imx-bus.c (Colin Ian King). - Print error message instead of error interger value in tegra30-devfreq.c (Dmitry Osipenko). - Add checks to prevent setting negative frequency QoS limits for CPUs (Shivnandan Kumar). - Update the pm-graph suite of utilities to the latest revision 5.9 including multiple improvements (Todd Brandt). - Drop pme_interrupt reference from the PCI power management documentation (Mario Limonciello). * pm-devfreq: PM / devfreq: tegra30: Add error message for devm_devfreq_add_device() PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero PM / devfreq: shut up kernel-doc warnings dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver dt-bindings: interconnect: Add MediaTek CCI dt-bindings * pm-qos: PM: QoS: Add check to make sure CPU freq is non-negative * pm-tools: pm-graph v5.9 * pm-docs: Documentation: PM: Drop pme_interrupt reference |
||
Rafael J. Wysocki
|
954a83fc60 |
Merge branches 'pm-core', 'pm-sleep', 'powercap', 'pm-domains' and 'pm-em'
Merge core device power management changes for v5.20-rc1: - Extend support for wakeirq to callback wrappers used during system suspend and resume (Ulf Hansson). - Defer waiting for device probe before loading a hibernation image till the first actual device access to avoid possible deadlocks reported by syzbot (Tetsuo Handa). - Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn Helgaas). - Add Raptor Lake-P to the list of processors supported by the Intel RAPL driver (George D Sworo). - Add Alder Lake-N and Raptor Lake-P to the list of processors for which Power Limit4 is supported in the Intel RAPL driver (Sumeet Pawnikar). - Make pm_genpd_remove() check genpd_debugfs_dir against NULL before attempting to remove it (Hsin-Yi Wang). - Change the Energy Model code to represent power in micro-Watts and adjust its users accordingly (Lukasz Luba). * pm-core: PM: runtime: Extend support for wakeirq for force_suspend|resume * pm-sleep: PM: hibernate: defer device probing when resuming from hibernation PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP * powercap: powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P powercap: intel_rapl: Add support for RAPTORLAKE_P * pm-domains: PM: domains: Ensure genpd_debugfs_dir exists before remove * pm-em: cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1 firmware: arm_scmi: Get detailed power scale from perf Documentation: EM: Switch to micro-Watts scale PM: EM: convert power field to micro-Watts precision and align drivers |
||
Linus Torvalds
|
e64ab2dbd8 |
watch_queue: Fix missing locking in add_watch_to_object()
If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.
KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.
While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.
Fix this by locking the watcher list in add_watch_to_object().
Fixes:
|
||
David Howells
|
e0339f036e |
watch_queue: Fix missing rcu annotation
Since __post_watch_notification() walks wlist->watchers with only the
RCU read lock held, we need to use RCU methods to add to the list (we
already use RCU methods to remove from the list).
Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
hlist_add_head() for that list.
Fixes:
|
||
Thomas Gleixner
|
779fda86bd |
irqchip/genirq updates for 5.20:
* Core code update: - Non-SMP IRQ affinity fixes, allowing UP kernel to behave similarly to SMP ones for the purpose of interrupt affinity - Let irq_set_chip_handler_name_locked() take a const struct irq_chip * - Tidy-up the NOMAP irqdomain API variant - Teach action_show() to use for_each_action_of_desc() - Make irq_chip_request_resources_parent() allow the parent callback to be optional - Remove dynamic allocations from populate_parent_alloc_arg() * New drivers: - Merge the long awaited IRQ support for the LoongArch architecture, with the provisional ACPICA update (to be reverted once the official support lands) - New Renesas RZ/G2L IRQC driver, equipped with its companion GPIO driver * Driver updates - Optimise the hot path operations for the SiFive PLIC, trading the locking for per-CPU priority masking masking operations which are apparently faster - Work around broken PLIC implementations that deal pretty badly with edge-triggered interrupts. Flag two implementations as affected. - Simplify the irq-stm32-exti driver, particularly the table that remaps the interrupts from exti to the GIC, reducing the memory usage - Convert the ocelot irq_chip to being immutable - Check ioremap() return value in the MIPS GIC driver - Move MMP driver init function declarations into the common .h - The obligatory typo fixes -----BEGIN PGP SIGNATURE----- iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmLhi0EPHG1hekBrZXJu ZWwub3JnAAoJECPQ0LrRPXpDI+wP/2BPABqCwu7JAmue8hHtpGweVkEBNulaS1K+ 1v/ElU8E1P8ppn1AVmu0lwgmAWiTtPuVWT21+AUbfOjQQ/MYKkegkH4KLmK93qSi Dn3MEiYv8WYsEV4yrJ7Il7fuuzr1iHIhIfhg3tMxDAJx47lzZH0o8nVoNFqXD1Ro WUFab/qTAOxJ/I53R9nrpx/yRf5dVRFUAZznrabYKpc/CiD/X+RLcHkbiybbRERu n3xwEtGMA2faCUgifKhsXoNUaW9mZLaufoF/F3J3Dyt7jNB/TTmdnxqWo6e4/rtd +Ut0MlH0W7bUdHGiVl1E90hDQ3yuBykUpKlTfMoYWOxeTqAx0bPYjGIuh6ajrAIy +fXWcK89KimOGB+aLC0cR5YrzvShHnH1G2qlrQg3pAXporigAXfZvzhnouRBsVKO RfnQHNsHSQHXTEu1u2HjMt7AXtmy/SoJENuPPUrtLfojg8b3aupwOvPLVx7w1Ok0 5TKZ2yhOHdskfr3lCPisVPKK0KZ+QZhDdBkd319JkxR5/iA/tzMeMTzKslruhd2U Ug6hYhKNE2kKkBBBMcEVCHAuuq94DnU/q6l458UTSkkBmvq5cMMSz5Fs0kMwGFRc q/DncpKQnrPKGrwiilj1uGgOWO2vec8KfMJUYtKzSM/QELbKUvF7yzZeIjUQxiDO KqlWNczX =E5fZ -----END PGP SIGNATURE----- Merge tag 'irqchip-5.20' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core Pull irqchip/genirq updates from Marc Zyngier: * Core code update: - Non-SMP IRQ affinity fixes, allowing UP kernel to behave similarly to SMP ones for the purpose of interrupt affinity - Let irq_set_chip_handler_name_locked() take a const struct irq_chip * - Tidy-up the NOMAP irqdomain API variant - Teach action_show() to use for_each_action_of_desc() - Make irq_chip_request_resources_parent() allow the parent callback to be optional - Remove dynamic allocations from populate_parent_alloc_arg() * New drivers: - Merge the long awaited IRQ support for the LoongArch architecture, with the provisional ACPICA update (to be reverted once the official support lands) - New Renesas RZ/G2L IRQC driver, equipped with its companion GPIO driver * Driver updates - Optimise the hot path operations for the SiFive PLIC, trading the locking for per-CPU priority masking masking operations which are apparently faster - Work around broken PLIC implementations that deal pretty badly with edge-triggered interrupts. Flag two implementations as affected. - Simplify the irq-stm32-exti driver, particularly the table that remaps the interrupts from exti to the GIC, reducing the memory usage - Convert the ocelot irq_chip to being immutable - Check ioremap() return value in the MIPS GIC driver - Move MMP driver init function declarations into the common .h - The obligatory typo fixes Link: https://lore.kernel.org/all/20220727192356.1860546-1-maz@kernel.org |
||
Lukas Bulwahn
|
871808fd69 |
x86/configs: Update configs in x86_debug.config
Commit |
||
Shivnandan Kumar
|
8d36694245 |
PM: QoS: Add check to make sure CPU freq is non-negative
CPU frequency should never be negative. If some client driver calls freq_qos_update_request with a negative value which will be very high in absolute terms, then frequency QoS sets max CPU freq at fmax as it considers it's absolute value but it will add plist node with negative priority. plist node has priority from INT_MIN (highest) to INT_MAX(lowest). Once priority is set as negative, another client will not be able to reduce CPU frequency. Adding check to make sure CPU freq is non-negative will fix this problem. Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Tetsuo Handa
|
8386c414e2 |
PM: hibernate: defer device probing when resuming from hibernation
syzbot is reporting hung task at misc_open() [1], for there is a race window of AB-BA deadlock which involves probe_count variable. Currently wait_for_device_probe() from snapshot_open() from misc_open() can sleep forever with misc_mtx held if probe_count cannot become 0. When a device is probed by hub_event() work function, probe_count is incremented before the probe function starts, and probe_count is decremented after the probe function completed. There are three cases that can prevent probe_count from dropping to 0. (a) A device being probed stopped responding (i.e. broken/malicious hardware). (b) A process emulating a USB device using /dev/raw-gadget interface stopped responding for some reason. (c) New device probe requests keeps coming in before existing device probe requests complete. The phenomenon syzbot is reporting is (b). A process which is holding system_transition_mutex and misc_mtx is waiting for probe_count to become 0 inside wait_for_device_probe(), but the probe function which is called from hub_event() work function is waiting for the processes which are blocked at mutex_lock(&misc_mtx) to respond via /dev/raw-gadget interface. This patch mitigates (b) by deferring wait_for_device_probe() from snapshot_open() to snapshot_write() and snapshot_ioctl(). Please note that the possibility of (b) remains as long as any thread which is emulating a USB device via /dev/raw-gadget interface can be blocked by uninterruptible blocking operations (e.g. mutex_lock()). Please also note that (a) and (c) are not addressed. Regarding (c), we should change the code to wait for only one device which contains the image for resuming from hibernation. I don't know how to address (a), for use of timeout for wait_for_device_probe() might result in loss of user data in the image. Maybe we should require the userland to wait for the image device before opening /dev/snapshot interface. Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1] Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Marc Zyngier
|
2bd1753e8c |
Merge branch irq/misc-5.20 into irq/irqchip-next
* irq/misc-5.20: : . : Misc IRQ changes for 5.20: : : - Let irq_set_chip_handler_name_locked() take a const struct irq_chip * : : - Convert the ocelot irq_chip to being immutable (depends on the above) : : - Tidy-up the NOMAP irqdomain API variant : : - Teach action_show() to use for_each_action_of_desc() : : - Check ioremap() return value in the MIPS GIC driver : : - Move MMP driver init function declarations into the common .h : : - The obligatory typo fixes : . irqchip/mmp: Declare init functions in common header file irqchip/mips-gic: Check the return value of ioremap() in gic_of_init() genirq: Use for_each_action_of_desc in actions_show() irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains irqdomain: Report irq number for NOMAP domains irqchip/gic-v3: Fix comment typo pinctrl: ocelot: Make irq_chip immutable genirq: Allow irq_set_chip_handler_name_locked() to take a const irq_chip Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
Linus Torvalds
|
c2602a7ce0 |
A single fix to correct a wrong BUG_ON() condition for deboosted tasks.
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLdDPMACgkQEsHwGGHe VUqmvRAAjIKW411+4FH0QRS5itqugSSwx6gAqX8muM4dn5B/aY0P3I14ghxrxnTt QG9XuevCK1i4OpxKcZLALC5fRqLNKr20WAdO+sNJzJYz1krB264bHOy46UY3MuwL wv2/nTLiR0MQisLfjE8Cdot0bYcTnIyKqSDvdrfBxqmoKo553A8uRTMZl3iPYU+W e3NhmG0PPzWzSz4y/Autk1GQMHOKcvvcPdsAUI+S2FwiQt/TIZ/Px2152NSV5Q4Y TIYfN5ylNw4BZxkq9tM3NMrZnrhT4TRihlYDf7PFf3WHDgh5vQmtOUZvLpH/0ZVO KUGCH2BPpfTVL4WBxfB2ADJWXoudEVb5r00JdI6TI9yYUXUE726BOOs3TwH+xvhP nGcLGErJcvFMYABMvJ7tLQpcC5561MNnqfRBO3svcVkNRKQb7r7UGqUpoevkpXLw 63G+HxzDFbs0BOwaOr8hjUnhu78hKVjHXr6IbBTjda7P5WNQgTE0a9oD1JiLAJVa RLupgq0X0FlTQip+EtbOhdGPui1HTDzYbGRoXkOxFBND4Zce9DEIkF1exsnITQat hsvCdUjhqOX5aOlrKTSVYAi4utYm5GcOU84x4andvg7z9vYDJpqpDXHFtXjF/za9 TIj3W4PXbNHaPsMD7Ph4RF98HDyrrVvwFce4wfx5u9Xrix/lVHE= =jolV -----END PGP SIGNATURE----- Merge tag 'sched_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: "A single fix to correct a wrong BUG_ON() condition for deboosted tasks" * tag 'sched_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/deadline: Fix BUG_ON condition for deboosted tasks |
||
Linus Torvalds
|
4ba1329cbb |
Urgent RCU pull request for v5.19
This pull request contains a pair of commits that fix |
||
Linus Torvalds
|
44e29e64cf |
watch-queue: remove spurious double semicolon
Sedat Dilek noticed that I had an extraneous semicolon at the end of a
line in the previous patch.
It's harmless, but unintentional, and while compilers just treat it as
an extra empty statement, for all I know some other tooling might warn
about it. So clean it up before other people notice too ;)
Fixes:
|
||
Cruz Zhao
|
91caa5ae24 |
sched/core: Fix the bug that task won't enqueue into core tree when update cookie
In function sched_core_update_cookie(), a task will enqueue into the core tree only when it enqueued before, that is, if an uncookied task is cookied, it will not enqueue into the core tree until it enqueue again, which will result in unnecessary force idle. Here follows the scenario: CPU x and CPU y are a pair of SMT siblings. 1. Start task a running on CPU x without sleeping, and task b and task c running on CPU y without sleeping. 2. We create a cookie and share it to task a and task b, and then we create another cookie and share it to task c. 3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched And we will find out that core_forceidle_sum of task a takes 30% time of the sampling period, which shouldn't happen as task a and b have the same cookie. Then we migrate task a to CPU x', migrate task b and c to CPU y', where CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we will found out that core_forceidle_sum of task a and b are almost zero. To solve this problem, we enqueue the task into the core tree if it's on rq. Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management") Signed-off-by: Cruz Zhao <CruzZhao@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com |
||
Nicolas Saenz Julienne
|
5c66d1b9b3 |
nohz/full, sched/rt: Fix missed tick-reenabling bug in dequeue_task_rt()
dequeue_task_rt() only decrements 'rt_rq->rt_nr_running' after having
called sched_update_tick_dependency() preventing it from re-enabling the
tick on systems that no longer have pending SCHED_RT tasks but have
multiple runnable SCHED_OTHER tasks:
dequeue_task_rt()
dequeue_rt_entity()
dequeue_rt_stack()
dequeue_top_rt_rq()
sub_nr_running() // decrements rq->nr_running
sched_update_tick_dependency()
sched_can_stop_tick() // checks rq->rt.rt_nr_running,
...
__dequeue_rt_entity()
dec_rt_tasks() // decrements rq->rt.rt_nr_running
...
Every other scheduler class performs the operation in the opposite
order, and sched_update_tick_dependency() expects the values to be
updated as such. So avoid the misbehaviour by inverting the order in
which the above operations are performed in the RT scheduler.
Fixes:
|
||
Juri Lelli
|
ddfc710395 |
sched/deadline: Fix BUG_ON condition for deboosted tasks
Tasks the are being deboosted from SCHED_DEADLINE might enter
enqueue_task_dl() one last time and hit an erroneous BUG_ON condition:
since they are not boosted anymore, the if (is_dl_boosted()) branch is
not taken, but the else if (!dl_prio) is and inside this one we
BUG_ON(!is_dl_boosted), which is of course false (BUG_ON triggered)
otherwise we had entered the if branch above. Long story short, the
current condition doesn't make sense and always leads to triggering of a
BUG.
Fix this by only checking enqueue flags, properly: ENQUEUE_REPLENISH has
to be present, but additional flags are not a problem.
Fixes:
|
||
Linus Torvalds
|
353f7988dd |
watchqueue: make sure to serialize 'wqueue->defunct' properly
When the pipe is closed, we mark the associated watchqueue defunct by calling watch_queue_clear(). However, while that is protected by the watchqueue lock, new watchqueue entries aren't actually added under that lock at all: they use the pipe->rd_wait.lock instead, and looking up that pipe happens without any locking. The watchqueue code uses the RCU read-side section to make sure that the wqueue entry itself hasn't disappeared, but that does not protect the pipe_info in any way. So make sure to actually hold the wqueue lock when posting watch events, properly serializing against the pipe being torn down. Reported-by: Noam Rathaus <noamr@ssd-disclosure.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Marc Zyngier
|
0fa72ed05e |
Merge branch irq/loongarch into irq/irqchip-next
* irq/loongarch: : . : Merge the long awaited IRQ support for the LoongArch architecture. : : From the cover letter: : : "Currently, LoongArch based processors (e.g. Loongson-3A5000) : can only work together with LS7A chipsets. The irq chips in : LoongArch computers include CPUINTC (CPU Core Interrupt : Controller), LIOINTC (Legacy I/O Interrupt Controller), : EIOINTC (Extended I/O Interrupt Controller), PCH-PIC (Main : Interrupt Controller in LS7A chipset), PCH-LPC (LPC Interrupt : Controller in LS7A chipset) and PCH-MSI (MSI Interrupt Controller)." : : Note that this comes with non-official, arch private ACPICA : definitions until the official ACPICA update is realeased. : . irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch irqchip: Add LoongArch CPU interrupt controller support irqchip: Add Loongson Extended I/O interrupt controller support irqchip/loongson-liointc: Add ACPI init support irqchip/loongson-pch-msi: Add ACPI init support irqchip/loongson-pch-pic: Add ACPI init support irqchip: Add Loongson PCH LPC controller support LoongArch: Prepare to support multiple pch-pic and pch-msi irqdomain LoongArch: Use ACPI_GENERIC_GSI for gsi handling genirq/generic_chip: Export irq_unmap_generic_chip ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback APCI: irq: Add support for multiple GSI domains LoongArch: Provisionally add ACPICA data structures Signed-off-by: Marc Zyngier <maz@kernel.org> |
||
Paran Lee
|
c904cda044 |
genirq: Use for_each_action_of_desc in actions_show()
Refactor action_show() to use for_each_action_of_desc instead of a similar open-coded loop. Signed-off-by: Paran Lee <p4ranlee@gmail.com> [maz: reword commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220710112614.19410-1-p4ranlee@gmail.com |
||
Jianmin Lv
|
d319a299f4 |
genirq/generic_chip: Export irq_unmap_generic_chip
Some irq controllers have to re-implement a private version for irq_generic_chip_ops, because they have a different xlate to translate hwirq. Export irq_unmap_generic_chip to allow reusing in drivers. Signed-off-by: Jianmin Lv <lvjianmin@loongson.cn> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1658314292-35346-5-git-send-email-lvjianmin@loongson.cn |
||
Neeraj Upadhyay
|
4f2bfd9494 |
srcu: Make expedited RCU grace periods block even less frequently
The purpose of commit |
||
Paul E. McKenney
|
8f870e6eb8 |
srcu: Block less aggressively for expedited grace periods
Commit |
||
Xu Qiang
|
ef50cd57a7 |
irqdomain: Use hwirq_max instead of revmap_size for NOMAP domains
NOMAP irq domains use the revmap_size field to indicate the maximum hwirq number the domain accepts. This is a bit confusing as revmap_size is usually used to indicate the size of the revmap array, which a NOMAP domain doesn't have. Instead, use the hwirq_max field which has the correct semantics, and keep revmap_size to 0 for a NOMAP domain. Signed-off-by: Xu Qiang <xuqiang36@huawei.com> [maz: commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220719063641.56541-3-xuqiang36@huawei.com |
||
Xu Qiang
|
6f194c99f4 |
irqdomain: Report irq number for NOMAP domains
When using a NOMAP domain, __irq_resolve_mapping() doesn't store
the Linux IRQ number at the address optionally provided by the caller.
While this isn't a huge deal (the returned value is guaranteed
to the hwirq that was passed as a parameter), let's honour the letter
of the API by writing the expected value.
Fixes:
|
||
Linus Torvalds
|
2b18593e4b |
- A single data race fix on the perf event cleanup path to avoid endless
loops due to insufficient locking -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmLTu80ACgkQEsHwGGHe VUrophAApPj8K9M6+JLeVKNocQMA+1XhWL/HRVmabI+1TpxO4/663wcbbUI04Z5e 51dGvvCBK413duoDUAn8tYAPjQStTrwqAS/toHcaSj+dDPHzZDd3M/Gn68SRy08d if26OjsXIGTZoHsCYJx0y01m9XHY4ZhVTtonsc3jZCmb/b8/feSBZcMtw+tASDAw 8m/P9rHfzVlfBYmZnyf2NH24NTVcHgoQUGobDo16ve1CTvH8d4jEr+YPsNLTYN+P 4cUslnvRG4HhC/u8namO8CbQVuXicyJBVdVBtfUJ0+IKojie7zCkVUOIPv+mWgQ7 r1XE2MPSeFQRa0IptiA0vIXQCgs9BRj6cBzgo2f3Y0QjU0GGKLTcIKrILv95aej7 X12+uNLKfnkYU4vuyG4o4AnXh047YxgfWLSQ569c/hHKuw8klTQkh0PbJEs6Epn0 21dU+9/p66ZPTCXXjEDDNsMHeVY00+lkdEOu9YzNzMUfR5Fo+zbAN7X9jiDAQDqc D9IdDeEmhdmrEKNOkankMTBF1tG1XiU5zWerREeMHRMKpJhxC5X1BGIDpuEq1PJD xa7uAPvc0O6WmNfVvXaJ2GFPzx8oq9inlocNk/0I2ZJxgkGFqKCYUZQI0AdtzPAj dHx66z09uXMQN+ecXwf5pF1QS/R6BEajOaUhBEFPUZ21pkEl12c= =/ETy -----END PGP SIGNATURE----- Merge tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - A single data race fix on the perf event cleanup path to avoid endless loops due to insufficient locking * tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix data race between perf_event_set_output() and perf_mmap_close() |
||
Linus Torvalds
|
be9b7b6acf |
Printk fixup for 5.19-rc7
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmLRfc8ACgkQUqAMR0iA lPIoGg/+M1WzHrSD4R9Per6WijKUxh3iL227uUd4QgsXKyA+2B/LMNOx6cGY24UW xo1hCKcvn0Q2xSKC96fuMPWgax7tjrnCGY3Jii095Q3pCIVCjknYi9tVq9GWlnkK FmwsxQMdX8llQPz4STttRISAq1E/RFOu4ImDvsBhO/45pW1f6lX+ITWixuMuqcRU X1ILQZ6gxuO9KDOKxfv7Go5owDSaWqYK7skjfIFlfDUy0o2p4moqndwj4OQWdsAU UOJvEeUc/ExvGW//xxkkuekGEqlsTpFj7LJeYl5jwT1FxNhVRVcrM1ds1Q3NApg4 9pyVdzQBgf+ZhBLPn1MqMEitSVz36A0lt41kUMdZ2g5pgHTPpqsgUQrCiqmUTJUo mM/7QvYDw4qFaPfxRSNWI4Nsy/dOevTcIJQhJC/nMKVGMnBv1C9xK9uzQuooK7BF zQXZeuktYjjhc115yYtFh22u1IEkRcttHd6aIqNAkplSVB+CmrRZuhmfNmJomQgD Rqn58fcHUvQYMtS9H14W2cKgpifG0uN1Qjq0hZ81bT8cSjNiVJQklifDtsEj+Oor sK7mLxmDdYhwcGHGz6Pt6iMLZbzUxgGcIMGUIIcYRakafttKwS/Wq6yIACB/zzkE LMxiSASOJDX7bh0qZNoOAekz3YUbhIr9PIIs9/OS22U2mL2LXcA= =vRnn -----END PGP SIGNATURE----- Merge tag 'printk-for-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Make pr_flush() fast when consoles are suspended. * tag 'printk-for-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: do not wait for consoles when suspended |
||
Lukasz Luba
|
ae6ccaa650 |
PM: EM: convert power field to micro-Watts precision and align drivers
The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Linus Torvalds
|
862161e8af |
Only one fix for sysctl
-----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmLQeXQSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinD7MQAI74hP6y3CvjkTbOHclCMSXfS3RFy9L+ 9zXR7u5X0dNNI/7iKseZS+QUphjuF2+8jyzFd66+tXwh2MabJKQ0otOfPgeR2QxR ciwSRwL9TTt93cDbNYcmgMYtD9TP4QfOBcVrGsF/vqd4GdnbyWJrXmXymDgugyLi fTjnYMpDy5lrv4iqjbPZneyF2Ozu9GXojnRvxMDxVxpvGyT1AdupF+W2s9rXpP2p ESoVKTWj3qRmgbW79rG+jFOwhH8Q0ItnDmhGYJ329iIMaVZYbto3OiVeULNIifhb AE0JG7/FadAWo6JGeqGcQuzoMs/90ASPL1DQ/WWexLmJO/hPVX8Lr+DANB6+YY// XozQ5j8bis8OwWTXN83fKLLOm+rL6rf/Y2Hg+dXdyDN5JKUOKGGqBaT2tsy6fW/G 83DMc9YWZVdTnkRaPXvcRc9r59A+5t9OMRKPUct5wHb/T5f7tuFOEeAdvPBYPiW2 HQnaVMMUCaA7EsxbYYGodFq2jlaZNF80twlZAUogxIcfhAfOp6hPChVv7yIP/vtX vX/SmoN7aidBU/TQu1Qit3mtDYU5jT0Vgl/T9J+i2b5atoUMkJPJ42PivNkRjOJb Yg2/QLywccCa+q3gwfnJUxLkS81r2O7PzTT/gp2UH6eOL409viX4WfmF8EVsQHlS y52rqN/x5mu7 =a6rR -----END PGP SIGNATURE----- Merge tag 'sysctl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pyll sysctl fix from Luis Chamberlain: "Only one fix for sysctl" * tag 'sysctl-fixes-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE |
||
Petr Mladek
|
1ac8ec2731 | Merge branch 'rework/kthreads' into for-linus | ||
John Ogness
|
9023ca0866 |
printk: do not wait for consoles when suspended
The console_stop() and console_start() functions call pr_flush().
When suspending, these functions are called by the serial subsystem
while the serial port is suspended. In this scenario, if there are
any pending messages, a call to pr_flush() will always result in a
timeout because the serial port cannot make forward progress. This
causes longer suspend and resume times.
Add a check in pr_flush() so that it will immediately timeout if
the consoles are suspended.
Fixes:
|
||
Muchun Song
|
43b5240ca6 |
mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
"numa_stat" should not be included in the scope of CONFIG_HUGETLB_PAGE, if
CONFIG_HUGETLB_PAGE is not configured even if CONFIG_NUMA is configured,
"numa_stat" is missed form /proc. Move it out of CONFIG_HUGETLB_PAGE to
fix it.
Fixes:
|
||
Linus Torvalds
|
9bd572ec7a |
Including fixes from netfilter, bpf and wireless.
Current release - regressions:
- wifi: rtw88: fix write to const table of channel parameters
Current release - new code bugs:
- mac80211: add gfp_t parameter to
ieeee80211_obss_color_collision_notify
- mlx5:
- TC, allow offload from uplink to other PF's VF
- Lag, decouple FDB selection and shared FDB
- Lag, correct get the port select mode str
- bnxt_en: fix and simplify XDP transmit path
- r8152: fix accessing unset transport header
Previous releases - regressions:
- conntrack: fix crash due to confirmed bit load reordering
(after atomic -> refcount conversion)
- stmmac: dwc-qos: disable split header for Tegra194
Previous releases - always broken:
- mlx5e: ring the TX doorbell on DMA errors
- bpf: make sure mac_header was set before using it
- mac80211: do not wake queues on a vif that is being stopped
- mac80211: fix queue selection for mesh/OCB interfaces
- ip: fix dflt addr selection for connected nexthop
- seg6: fix skb checksums for SRH encapsulation/insertion
- xdp: fix spurious packet loss in generic XDP TX path
- bunch of sysctl data race fixes
- nf_log: incorrect offset to network header
Misc:
- bpf: add flags arg to bpf_dynptr_read and bpf_dynptr_write APIs
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmLQXuAACgkQMUZtbf5S
Irv3sBAAxoD5A0Q5zRLmfTvbXth8fVfWmqvDfxJvOcChr97Q/JyCTZrmSIqhIz85
6ADxF45PuOivpBU8dA3MF9gCtvlWcU6SJpRVZOP0v+FfZBESGdskG9OWXlS50mht
IF64LlEzfjvD8Mylf2xiuuuaDcDzuF9s2KXCBSh3qFDXP9VYPaSMjA22+YwApkvT
29EKSujBIod0ScIeP6rA7nZKtxNloGp+tGNeHqxP+LrALq5pQlwA43wTyvcgvfME
QgGsqUcn4UzaxJ6YIFNNwx+KRJI7JCdgxNupehaExdnvZJNHDuxSZKXwkCKFOhB6
vOQDDbfDCtTaFfw0elpF18hayUtDyl9ezAR1DlxZWwyPv46gHFlH/PreXLf4Zvvh
R8dAP5YLQjtNri3Ae8gdiQYzct0WXKjiauNdjF60Hh1dXe6j01Vbqh92J96Zr14U
uxDRWzKi1pyfrAULY4BB7sRLXc6IllcUFEnMmKYhYl7afV8VB0OjQ83VKjxW4az8
gcczXejgW6rNcV128vLYHICUCawoiIlA29efM17vGG7Q65O/vhqOxO8Moi1hiQN+
2GwMWxCQ3lIXz41oQ2TNt3ekDYuSFhj8T/qyQEOckp+QW91nbseJBIhyU7MF0WE9
e5sETK8CJMzQwF/zkJMAuohvc+IelGdhRayHVGBYWGwVN1CCqiU=
=TFnI
-----END PGP SIGNATURE-----
Merge tag 'net-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, bpf and wireless.
Still no major regressions, the release continues to be calm. An
uptick of fixes this time around due to trivial data race fixes and
patches flowing down from subtrees.
There has been a few driver fixes (particularly a few fixes for false
positives due to
|
||
Linus Torvalds
|
4adfa865bb |
integrity-v5.19-fix
-----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQQdXVVFGN5XqKr1Hj7LwZzRsCrn5QUCYtAwjhQcem9oYXJAbGlu dXguaWJtLmNvbQAKCRDLwZzRsCrn5TaxAQD2uVSa1/t9/cdTz3jWdWKrF080jChb uiYsZKA4RHbwjgEA8dCAa5zsfHX8Y0+vVqA65eyu1dQA98WbJDMQ4AaFVAg= =7Yy6 -----END PGP SIGNATURE----- Merge tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity fixes from Mimi Zohar: "Here are a number of fixes for recently found bugs. Only 'ima: fix violation measurement list record' was introduced in the current release. The rest address existing bugs" * tag 'integrity-v5.19-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: ima: Fix potential memory leak in ima_init_crypto() ima: force signature verification when CONFIG_KEXEC_SIG is configured ima: Fix a potential integer overflow in ima_appraise_measurement ima: fix violation measurement list record Revert "evm: Fix memleak in init_desc" |
||
Linus Torvalds
|
d0b97f3891 |
cgroup fixes for v5.19-rc6
This pull request contains the fix for an old and subtle bug in the migration path. css_sets are used to track tasks and migrations are tasks moving from a group of css_sets to another group of css_sets. The migration path pins all source and destination css_sets in the prep stage. Unfortunately, it was overloading the same list_head entry to track sources and destinations, which got confused for migrations which are partially identity leading to use-after-frees. Fixed by using dedicated list_heads for tracking sources and destinations. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCYs48bg4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGb+HAQDNfUNLYajLjwJNklQdu/S4fxsg0qiY6J8SVkpo NYP2zQEAjZmBdNnW8MqutETBCwKq8v80gCphIT/Z72NNPStqPgQ= =r72j -----END PGP SIGNATURE----- Merge tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fix from Tejun Heo: "Fix an old and subtle bug in the migration path. css_sets are used to track tasks and migrations are tasks moving from a group of css_sets to another group of css_sets. The migration path pins all source and destination css_sets in the prep stage. Unfortunately, it was overloading the same list_head entry to track sources and destinations, which got confused for migrations which are partially identity leading to use-after-frees. Fixed by using dedicated list_heads for tracking sources and destinations" * tag 'cgroup-for-5.19-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Use separate src/dst nodes when preloading css_sets for migration |
||
Coiby Xu
|
af16df54b8 |
ima: force signature verification when CONFIG_KEXEC_SIG is configured
Currently, an unsigned kernel could be kexec'ed when IMA arch specific
policy is configured unless lockdown is enabled. Enforce kernel
signature verification check in the kexec_file_load syscall when IMA
arch specific policy is configured.
Fixes:
|
||
Kuniyuki Iwashima
|
7d1025e559 |
sysctl: Fix data-races in proc_dointvec_ms_jiffies().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
This patch changes proc_dointvec_ms_jiffies() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dointvec_ms_jiffies() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.
Fixes:
|
||
Kuniyuki Iwashima
|
7dee5d7747 |
sysctl: Fix data-races in proc_dou8vec_minmax().
A sysctl variable is accessed concurrently, and there is always a chance
of data-race. So, all readers and writers need some basic protection to
avoid load/store-tearing.
This patch changes proc_dou8vec_minmax() to use READ_ONCE() and
WRITE_ONCE() internally to fix data-races on the sysctl side. For now,
proc_dou8vec_minmax() itself is tolerant to a data-race, but we still
need to add annotations on the other subsystem's side.
Fixes:
|
||
John Keeping
|
401e4963bf |
sched/core: Always flush pending blk_plug
With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two
normal priority tasks (SCHED_OTHER, nice level zero):
INFO: task kworker/u8:0:8 blocked for more than 491 seconds.
Not tainted 5.15.49-rt46 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u8:0 state:D stack: 0 pid: 8 ppid: 2 flags:0x00000000
Workqueue: writeback wb_workfn (flush-7:0)
[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
[<c08a3d84>] (schedule) from [<c08a65a0>] (rt_mutex_slowlock_block.constprop.0+0xb8/0x174)
[<c08a65a0>] (rt_mutex_slowlock_block.constprop.0) from [<c08a6708>]
+(rt_mutex_slowlock.constprop.0+0xac/0x174)
[<c08a6708>] (rt_mutex_slowlock.constprop.0) from [<c0374d60>] (fat_write_inode+0x34/0x54)
[<c0374d60>] (fat_write_inode) from [<c0297304>] (__writeback_single_inode+0x354/0x3ec)
[<c0297304>] (__writeback_single_inode) from [<c0297998>] (writeback_sb_inodes+0x250/0x45c)
[<c0297998>] (writeback_sb_inodes) from [<c0297c20>] (__writeback_inodes_wb+0x7c/0xb8)
[<c0297c20>] (__writeback_inodes_wb) from [<c0297f24>] (wb_writeback+0x2c8/0x2e4)
[<c0297f24>] (wb_writeback) from [<c0298c40>] (wb_workfn+0x1a4/0x3e4)
[<c0298c40>] (wb_workfn) from [<c0138ab8>] (process_one_work+0x1fc/0x32c)
[<c0138ab8>] (process_one_work) from [<c0139120>] (worker_thread+0x22c/0x2d8)
[<c0139120>] (worker_thread) from [<c013e6e0>] (kthread+0x16c/0x178)
[<c013e6e0>] (kthread) from [<c01000fc>] (ret_from_fork+0x14/0x38)
Exception stack(0xc10e3fb0 to 0xc10e3ff8)
3fa0: 00000000 00000000 00000000 00000000
3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
INFO: task tar:2083 blocked for more than 491 seconds.
Not tainted 5.15.49-rt46 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:tar state:D stack: 0 pid: 2083 ppid: 2082 flags:0x00000000
[<c08a3a10>] (__schedule) from [<c08a3d84>] (schedule+0xdc/0x134)
[<c08a3d84>] (schedule) from [<c08a41b0>] (io_schedule+0x14/0x24)
[<c08a41b0>] (io_schedule) from [<c08a455c>] (bit_wait_io+0xc/0x30)
[<c08a455c>] (bit_wait_io) from [<c08a441c>] (__wait_on_bit_lock+0x54/0xa8)
[<c08a441c>] (__wait_on_bit_lock) from [<c08a44f4>] (out_of_line_wait_on_bit_lock+0x84/0xb0)
[<c08a44f4>] (out_of_line_wait_on_bit_lock) from [<c0371fb0>] (fat_mirror_bhs+0xa0/0x144)
[<c0371fb0>] (fat_mirror_bhs) from [<c0372a68>] (fat_alloc_clusters+0x138/0x2a4)
[<c0372a68>] (fat_alloc_clusters) from [<c0370b14>] (fat_alloc_new_dir+0x34/0x250)
[<c0370b14>] (fat_alloc_new_dir) from [<c03787c0>] (vfat_mkdir+0x58/0x148)
[<c03787c0>] (vfat_mkdir) from [<c0277b60>] (vfs_mkdir+0x68/0x98)
[<c0277b60>] (vfs_mkdir) from [<c027b484>] (do_mkdirat+0xb0/0xec)
[<c027b484>] (do_mkdirat) from [<c0100060>] (ret_fast_syscall+0x0/0x1c)
Exception stack(0xc2e1bfa8 to 0xc2e1bff0)
bfa0: 01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004000
bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4190
bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc
Here the kworker is waiting on msdos_sb_info::s_lock which is held by
tar which is in turn waiting for a buffer which is locked waiting to be
flushed, but this operation is plugged in the kworker.
The lock is a normal struct mutex, so tsk_is_pi_blocked() will always
return false on !RT and thus the behaviour changes for RT.
It seems that the intent here is to skip blk_flush_plug() in the case
where a non-preemptible lock (such as a spinlock) has been converted to
a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT
schedule flag. But sched_submit_work() is only called from schedule()
which is never called in this scenario, so the check can simply be
deleted.
Looking at the history of the -rt patchset, in fact this change was
present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part
of a larger patch [1] most of which was replaced by commit
|
||
Vincent Guittot
|
c82a69629c |
sched/fair: fix case with reduced capacity CPU
The capacity of the CPU available for CFS tasks can be reduced because of other activities running on the latter. In such case, it's worth trying to move CFS tasks on a CPU with more available capacity. The rework of the load balance has filtered the case when the CPU is classified to be fully busy but its capacity is reduced. Check if CPU's capacity is reduced while gathering load balance statistic and classify it group_misfit_task instead of group_fully_busy so we can try to move the load on another CPU. Reported-by: David Chen <david.chen@nutanix.com> Reported-by: Zhang Qiao <zhangqiao22@huawei.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: David Chen <david.chen@nutanix.com> Tested-by: Zhang Qiao <zhangqiao22@huawei.com> Link: https://lkml.kernel.org/r/20220708154401.21411-1-vincent.guittot@linaro.org |