e5bfcd30f88fdb0ce830229e7ccdeddcb7a59b04
38967 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
09583dfed2 |
Merge tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add support for 'artificial' Energy Models in which power
numbers for different entities may be in different scales, add support
for some new hardware, fix bugs and clean up code in multiple places.
Specifics:
- Update the Energy Model support code to allow the Energy Model to
be artificial, which means that the power values may not be on a
uniform scale with other devices providing power information, and
update the cpufreq_cooling and devfreq_cooling thermal drivers to
support artificial Energy Models (Lukasz Luba).
- Make DTPM check the Energy Model type (Lukasz Luba).
- Fix policy counter decrementation in cpufreq if Energy Model is in
use (Pierre Gondois).
- Add CPU-based scaling support to passive devfreq governor (Saravana
Kannan, Chanwoo Choi).
- Update the rk3399_dmc devfreq driver (Brian Norris).
- Export dev_pm_ops instead of suspend() and resume() in the IIO
chemical scd30 driver (Jonathan Cameron).
- Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and
PM-runtime counterparts (Jonathan Cameron).
- Move symbol exports in the IIO chemical scd30 driver into the
IIO_SCD30 namespace (Jonathan Cameron).
- Avoid device PM-runtime usage count underflows (Rafael Wysocki).
- Allow dynamic debug to control printing of PM messages (David
Cohen).
- Fix some kernel-doc comments in hibernation code (Yang Li, Haowen
Bai).
- Preserve ACPI-table override during hibernation (Amadeusz
Sławiński).
- Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson).
- Make Intel RAPL power capping driver support the RaptorLake and
AlderLake N processors (Zhang Rui, Sumeet Pawnikar).
- Remove redundant store to value after multiply in the RAPL power
capping driver (Colin Ian King).
- Add AlderLake processor support to the intel_idle driver (Zhang
Rui).
- Fix regression leading to no genpd governor in the PSCI cpuidle
driver and fix the riscv-sbi cpuidle driver to allow a genpd
governor to be used (Ulf Hansson).
- Fix cpufreq governor clean up code to avoid using kfree() directly
to free kobject-based items (Kevin Hao).
- Prepare cpufreq for powerpc's asm/prom.h cleanup (Christophe
Leroy).
- Make intel_pstate notify frequency invariance code when no_turbo is
turned on and off (Chen Yu).
- Add Sapphire Rapids OOB mode support to intel_pstate (Srinivas
Pandruvada).
- Make cpufreq avoid unnecessary frequency updates due to mismatch
between hardware and the frequency table (Viresh Kumar).
- Make remove_cpu_dev_symlink() clear the real_cpus mask to simplify
code (Viresh Kumar).
- Rearrange cpufreq_offline() and cpufreq_remove_dev() to make the
calling convention for some driver callbacks consistent (Rafael
Wysocki).
- Avoid accessing half-initialized cpufreq policies from the show()
and store() sysfs functions (Schspa Shi).
- Rearrange cpufreq_offline() to make the calling convention for some
driver callbacks consistent (Schspa Shi).
- Update CPPC handling in cpufreq (Pierre Gondois).
- Extend dev_pm_domain_detach() doc (Krzysztof Kozlowski).
- Move genpd's time-accounting to ktime_get_mono_fast_ns() (Ulf
Hansson).
- Improve the way genpd deals with its governors (Ulf Hansson).
- Update the turbostat utility to version 2022.04.16 (Len Brown, Dan
Merillat, Sumeet Pawnikar, Zephaniah E. Loss-Cutler-Hull, Chen Yu)"
* tag 'pm-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (94 commits)
PM: domains: Trust domain-idle-states from DT to be correct by genpd
PM: domains: Measure power-on/off latencies in genpd based on a governor
PM: domains: Allocate governor data dynamically based on a genpd governor
PM: domains: Clean up some code in pm_genpd_init() and genpd_remove()
PM: domains: Fix initialization of genpd's next_wakeup
PM: domains: Fixup QoS latency measurements for IRQ safe devices in genpd
PM: domains: Measure suspend/resume latencies in genpd based on governor
PM: domains: Move the next_wakeup variable into the struct gpd_timing_data
PM: domains: Allocate gpd_timing_data dynamically based on governor
PM: domains: Skip another warning in irq_safe_dev_in_sleep_domain()
PM: domains: Rename irq_safe_dev_in_no_sleep_domain() in genpd
PM: domains: Don't check PM_QOS_FLAG_NO_POWER_OFF in genpd
PM: domains: Drop redundant code for genpd always-on governor
PM: domains: Add GENPD_FLAG_RPM_ALWAYS_ON for the always-on governor
powercap: intel_rapl: remove redundant store to value after multiply
cpufreq: CPPC: Enable dvfs_possible_from_any_cpu
cpufreq: CPPC: Enable fast_switch
ACPI: CPPC: Assume no transition latency if no PCCT
ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported
ACPI: CPPC: Check _OSC for flexible address space
...
|
||
|
|
dc8af1ffd6 |
Merge tag 'seccomp-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp updates from Kees Cook: - Rework USER_NOTIF notification ordering and kill logic (Sargun Dhillon) - Improved PTRACE_O_SUSPEND_SECCOMP selftest (Jann Horn) - Gracefully handle failed unshare() in selftests (Yang Guang) - Spelling fix (Colin Ian King) * tag 'seccomp-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: Fix spelling mistake "Coud" -> "Could" selftests/seccomp: Add test for wait killable notifier selftests/seccomp: Refactor get_proc_stat to split out file reading code seccomp: Add wait_killable semantic to seccomp user notifier selftests/seccomp: Ensure that notifications come in FIFO order seccomp: Use FIFO semantics to order notifications selftests/seccomp: Add SKIP for failed unshare() selftests/seccomp: Test PTRACE_O_SUSPEND_SECCOMP without CAP_SYS_ADMIN |
||
|
|
0bf13a8436 |
Merge tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kernel hardening updates from Kees Cook: - usercopy hardening expanded to check other allocation types (Matthew Wilcox, Yuanzheng Song) - arm64 stackleak behavioral improvements (Mark Rutland) - arm64 CFI code gen improvement (Sami Tolvanen) - LoadPin LSM block dev API adjustment (Christoph Hellwig) - Clang randstruct support (Bill Wendling, Kees Cook) * tag 'kernel-hardening-v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (34 commits) loadpin: stop using bdevname mm: usercopy: move the virt_addr_valid() below the is_vmalloc_addr() gcc-plugins: randstruct: Remove cast exception handling af_unix: Silence randstruct GCC plugin warning niu: Silence randstruct warnings big_keys: Use struct for internal payload gcc-plugins: Change all version strings match kernel randomize_kstack: Improve docs on requirements/rationale lkdtm/stackleak: fix CONFIG_GCC_PLUGIN_STACKLEAK=n arm64: entry: use stackleak_erase_on_task_stack() stackleak: add on/off stack variants lkdtm/stackleak: check stack boundaries lkdtm/stackleak: prevent unexpected stack usage lkdtm/stackleak: rework boundary management lkdtm/stackleak: avoid spurious failure stackleak: rework poison scanning stackleak: rework stack high bound handling stackleak: clarify variable names stackleak: rework stack low bound handling stackleak: remove redundant check ... |
||
|
|
ac2ab99072 |
Merge tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"These updates continue to refine the work began in 5.17 and 5.18 of
modernizing the RNG's crypto and streamlining and documenting its
code.
New for 5.19, the updates aim to improve entropy collection methods
and make some initial decisions regarding the "premature next" problem
and our threat model. The cloc utility now reports that random.c is
931 lines of code and 466 lines of comments, not that basic metrics
like that mean all that much, but at the very least it tells you that
this is very much a manageable driver now.
Here's a summary of the various updates:
- The random_get_entropy() function now always returns something at
least minimally useful. This is the primary entropy source in most
collectors, which in the best case expands to something like RDTSC,
but prior to this change, in the worst case it would just return 0,
contributing nothing. For 5.19, additional architectures are wired
up, and architectures that are entirely missing a cycle counter now
have a generic fallback path, which uses the highest resolution
clock available from the timekeeping subsystem.
Some of those clocks can actually be quite good, despite the CPU
not having a cycle counter of its own, and going off-core for a
stamp is generally thought to increase jitter, something positive
from the perspective of entropy gathering. Done very early on in
the development cycle, this has been sitting in next getting some
testing for a while now and has relevant acks from the archs, so it
should be pretty well tested and fine, but is nonetheless the thing
I'll be keeping my eye on most closely.
- Of particular note with the random_get_entropy() improvements is
MIPS, which, on CPUs that lack the c0 count register, will now
combine the high-speed but short-cycle c0 random register with the
lower-speed but long-cycle generic fallback path.
- With random_get_entropy() now always returning something useful,
the interrupt handler now collects entropy in a consistent
construction.
- Rather than comparing two samples of random_get_entropy() for the
jitter dance, the algorithm now tests many samples, and uses the
amount of differing ones to determine whether or not jitter entropy
is usable and how laborious it must be. The problem with comparing
only two samples was that if the cycle counter was extremely slow,
but just so happened to be on the cusp of a change, the slowness
wouldn't be detected. Taking many samples fixes that to some
degree.
This, combined with the other improvements to random_get_entropy(),
should make future unification of /dev/random and /dev/urandom
maybe more possible. At the very least, were we to attempt it again
today (we're not), it wouldn't break any of Guenter's test rigs
that broke when we tried it with 5.18. So, not today, but perhaps
down the road, that's something we can revisit.
- We attempt to reseed the RNG immediately upon waking up from system
suspend or hibernation, making use of the various timestamps about
suspend time and such available, as well as the usual inputs such
as RDRAND when available.
- Batched randomness now falls back to ordinary randomness before the
RNG is initialized. This provides more consistent guarantees to the
types of random numbers being returned by the various accessors.
- The "pre-init injection" code is now gone for good. I suspect you
in particular will be happy to read that, as I recall you
expressing your distaste for it a few months ago. Instead, to avoid
a "premature first" issue, while still allowing for maximal amount
of entropy availability during system boot, the first 128 bits of
estimated entropy are used immediately as it arrives, with the next
128 bits being buffered. And, as before, after the RNG has been
fully initialized, it winds up reseeding anyway a few seconds later
in most cases. This resulted in a pretty big simplification of the
initialization code and let us remove various ad-hoc mechanisms
like the ugly crng_pre_init_inject().
- The RNG no longer pretends to handle the "premature next" security
model, something that various academics and other RNG designs have
tried to care about in the past. After an interesting mailing list
thread, these issues are thought to be a) mainly academic and not
practical at all, and b) actively harming the real security of the
RNG by delaying new entropy additions after a potential compromise,
making a potentially bad situation even worse. As well, in the
first place, our RNG never even properly handled the premature next
issue, so removing an incomplete solution to a fake problem was
particularly nice.
This allowed for numerous other simplifications in the code, which
is a lot cleaner as a consequence. If you didn't see it before,
https://lore.kernel.org/lkml/YmlMGx6+uigkGiZ0@zx2c4.com/ may be a
thread worth skimming through.
- While the interrupt handler received a separate code path years ago
that avoids locks by using per-cpu data structures and a faster
mixing algorithm, in order to reduce interrupt latency, input and
disk events that are triggered in hardirq handlers were still
hitting locks and more expensive algorithms. Those are now
redirected to use the faster per-cpu data structures.
- Rather than having the fake-crypto almost-siphash-based random32
implementation be used right and left, and in many places where
cryptographically secure randomness is desirable, the batched
entropy code is now fast enough to replace that.
- As usual, numerous code quality and documentation cleanups. For
example, the initialization state machine now uses enum symbolic
constants instead of just hard coding numbers everywhere.
- Since the RNG initializes once, and then is always initialized
thereafter, a pretty heavy amount of code used during that
initialization is never used again. It is now completely cordoned
off using static branches and it winds up in the .text.unlikely
section so that it doesn't reduce cache compactness after the RNG
is ready.
- A variety of functions meant for waiting on the RNG to be
initialized were only used by vsprintf, and in not a particularly
optimal way. Replacing that usage with a more ordinary setup made
it possible to remove those functions.
- A cleanup of how we warn userspace about the use of uninitialized
/dev/urandom and uninitialized get_random_bytes() usage.
Interestingly, with the change you merged for 5.18 that attempts to
use jitter (but does not block if it can't), the majority of users
should never see those warnings for /dev/urandom at all now, and
the one for in-kernel usage is mainly a debug thing.
- The file_operations struct for /dev/[u]random now implements
.read_iter and .write_iter instead of .read and .write, allowing it
to also implement .splice_read and .splice_write, which makes
splice(2) work again after it was broken here (and in many other
places in the tree) during the set_fs() removal. This was a bit of
a last minute arrival from Jens that hasn't had as much time to
bake, so I'll be keeping my eye on this as well, but it seems
fairly ordinary. Unfortunately, read_iter() is around 3% slower
than read() in my tests, which I'm not thrilled about. But Jens and
Al, spurred by this observation, seem to be making progress in
removing the bottlenecks on the iter paths in the VFS layer in
general, which should remove the performance gap for all drivers.
- Assorted other bug fixes, cleanups, and optimizations.
- A small SipHash cleanup"
* tag 'random-5.19-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (49 commits)
random: check for signals after page of pool writes
random: wire up fops->splice_{read,write}_iter()
random: convert to using fops->write_iter()
random: convert to using fops->read_iter()
random: unify batched entropy implementations
random: move randomize_page() into mm where it belongs
random: remove mostly unused async readiness notifier
random: remove get_random_bytes_arch() and add rng_has_arch_random()
random: move initialization functions out of hot pages
random: make consistent use of buf and len
random: use proper return types on get_random_{int,long}_wait()
random: remove extern from functions in header
random: use static branch for crng_ready()
random: credit architectural init the exact amount
random: handle latent entropy and command line from random_init()
random: use proper jiffies comparison macro
random: remove ratelimiting for in-kernel unseeded randomness
random: move initialization out of reseeding hot path
random: avoid initializing twice in credit race
random: use symbolic constants for crng_init states
...
|
||
|
|
eadb2f47a3 |
lockdown: also lock down previous kgdb use
KGDB and KDB allow read and write access to kernel memory, and thus should be restricted during lockdown. An attacker with access to a serial port (for example, via a hypervisor console, which some cloud vendors provide over the network) could trigger the debugger so it is important that the debugger respect the lockdown mode when/if it is triggered. Fix this by integrating lockdown into kdb's existing permissions mechanism. Unfortunately kgdb does not have any permissions mechanism (although it certainly could be added later) so, for now, kgdb is simply and brutally disabled by immediately exiting the gdb stub without taking any action. For lockdowns established early in the boot (e.g. the normal case) then this should be fine but on systems where kgdb has set breakpoints before the lockdown is enacted than "bad things" will happen. CVE: CVE-2022-21499 Co-developed-by: Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
|
|
6f3f04c190 |
Merge tag 'sched-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Updates to scheduler metrics:
- PELT fixes & enhancements
- PSI fixes & enhancements
- Refactor cpu_util_without()
- Updates to instrumentation/debugging:
- Remove sched_trace_*() helper functions - can be done via debug
info
- Fix double update_rq_clock() warnings
- Introduce & use "preemption model accessors" to simplify some of the
Kconfig complexity.
- Make softirq handling RT-safe.
- Misc smaller fixes & cleanups.
* tag 'sched-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
topology: Remove unused cpu_cluster_mask()
sched: Reverse sched_class layout
sched/deadline: Remove superfluous rq clock update in push_dl_task()
sched/core: Avoid obvious double update_rq_clock warning
smp: Make softirq handling RT safe in flush_smp_call_function_queue()
smp: Rename flush_smp_call_function_from_idle()
sched: Fix missing prototype warnings
sched/fair: Remove cfs_rq_tg_path()
sched/fair: Remove sched_trace_*() helper functions
sched/fair: Refactor cpu_util_without()
sched/fair: Revise comment about lb decision matrix
sched/psi: report zeroes for CPU full at the system level
sched/fair: Delete useless condition in tg_unthrottle_up()
sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq
sched/fair: Move calculate of avg_load to a better location
mailmap: Update my email address to @redhat.com
MAINTAINERS: Add myself as scheduler topology reviewer
psi: Fix trigger being fired unexpectedly at initial
ftrace: Use preemption model accessors for trace header printout
kcsan: Use preemption model accessors
|
||
|
|
cfeb2522c3 |
Merge tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
"Platform PMU changes:
- x86/intel:
- Add new Intel Alder Lake and Raptor Lake support
- x86/amd:
- AMD Zen4 IBS extensions support
- Add AMD PerfMonV2 support
- Add AMD Fam19h Branch Sampling support
Generic changes:
- signal: Deliver SIGTRAP on perf event asynchronously if blocked
Perf instrumentation can be driven via SIGTRAP, but this causes a
problem when SIGTRAP is blocked by a task & terminate the task.
Allow user-space to request these signals asynchronously (after
they get unblocked) & also give the information to the signal
handler when this happens:
"To give user space the ability to clearly distinguish
synchronous from asynchronous signals, introduce
siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for
flags in case more binary information is required in future).
The resolution to the problem is then to (a) no longer force the
signal (avoiding the terminations), but (b) tell user space via
si_perf_flags if the signal was synchronous or not, so that such
signals can be handled differently (e.g. let user space decide
to ignore or consider the data imprecise). "
- Unify/standardize the /sys/devices/cpu/events/* output format.
- Misc fixes & cleanups"
* tag 'perf-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits)
perf/x86/amd/core: Fix reloading events for SVM
perf/x86/amd: Run AMD BRS code only on supported hw
perf/x86/amd: Fix AMD BRS period adjustment
perf/x86/amd: Remove unused variable 'hwc'
perf/ibs: Fix comment
perf/amd/ibs: Advertise zen4_ibs_extensions as pmu capability attribute
perf/amd/ibs: Add support for L3 miss filtering
perf/amd/ibs: Use ->is_visible callback for dynamic attributes
perf/amd/ibs: Cascade pmu init functions' return value
perf/x86/uncore: Add new Alder Lake and Raptor Lake support
perf/x86/uncore: Clean up uncore_pci_ids[]
perf/x86/cstate: Add new Alder Lake and Raptor Lake support
perf/x86/msr: Add new Alder Lake and Raptor Lake support
perf/x86: Add new Alder Lake and Raptor Lake support
perf/amd/ibs: Use interrupt regs ip for stack unwinding
perf/x86/amd/core: Add PerfMonV2 overflow handling
perf/x86/amd/core: Add PerfMonV2 counter control
perf/x86/amd/core: Detect available counters
perf/x86/amd/core: Detect PerfMonV2 support
x86/msr: Add PerfCntrGlobal* registers
...
|
||
|
|
22922deae1 |
Merge tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Comprehensive interface overhaul:
=================================
Objtool's interface has some issues:
- Several features are done unconditionally, without any way to
turn them off. Some of them might be surprising. This makes
objtool tricky to use, and prevents porting individual features
to other arches.
- The config dependencies are too coarse-grained. Objtool
enablement is tied to CONFIG_STACK_VALIDATION, but it has several
other features independent of that.
- The objtool subcmds ("check" and "orc") are clumsy: "check" is
really a subset of "orc", so it has all the same options.
The subcmd model has never really worked for objtool, as it only
has a single purpose: "do some combination of things on an object
file".
- The '--lto' and '--vmlinux' options are nonsensical and have
surprising behavior.
Overhaul the interface:
- get rid of subcmds
- make all features individually selectable
- remove and/or clarify confusing/obsolete options
- update the documentation
- fix some bugs found along the way
- Fix x32 regression
- Fix Kbuild cleanup bugs
- Add scripts/objdump-func helper script to disassemble a single
function from an object file.
- Rewrite scripts/faddr2line to be section-aware, by basing it on
'readelf', moving it away from 'nm', which doesn't handle multiple
sections well, which can result in decoding failure.
- Rewrite & fix symbol handling - which had a number of bugs wrt.
object files that don't have global symbols - which is rare but
possible. Also fix a bunch of symbol handling bugs found along the
way.
* tag 'objtool-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
objtool: Fix objtool regression on x32 systems
objtool: Fix symbol creation
scripts/faddr2line: Fix overlapping text section failures
scripts: Create objdump-func helper script
objtool: Remove libsubcmd.a when make clean
objtool: Remove inat-tables.c when make clean
objtool: Update documentation
objtool: Remove --lto and --vmlinux in favor of --link
objtool: Add HAVE_NOINSTR_VALIDATION
objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION"
objtool: Make noinstr hacks optional
objtool: Make jump label hack optional
objtool: Make static call annotation optional
objtool: Make stack validation frame-pointer-specific
objtool: Add CONFIG_OBJTOOL
objtool: Extricate sls from stack validation
objtool: Rework ibt and extricate from stack validation
objtool: Make stack validation optional
objtool: Add option to print section addresses
objtool: Don't print parentheses in function addresses
...
|
||
|
|
2319be1356 |
Merge tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- rwsem cleanups & optimizations/fixes:
- Conditionally wake waiters in reader/writer slowpaths
- Always try to wake waiters in out_nolock path
- Add try_cmpxchg64() implementation, with arch optimizations - and use
it to micro-optimize sched_clock_{local,remote}()
- Various force-inlining fixes to address objdump instrumentation-check
warnings
- Add lock contention tracepoints:
lock:contention_begin
lock:contention_end
- Misc smaller fixes & cleanups
* tag 'locking-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
locking/atomic/x86: Introduce arch_try_cmpxchg64
locking/atomic: Add generic try_cmpxchg64 support
futex: Remove a PREEMPT_RT_FULL reference.
locking/qrwlock: Change "queue rwlock" to "queued rwlock"
lockdep: Delete local_irq_enable_in_hardirq()
locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning
locking: Apply contention tracepoints in the slow path
locking: Add lock contention tracepoints
locking/rwsem: Always try to wake waiters in out_nolock path
locking/rwsem: Conditionally wake waiters in reader/writer slowpaths
locking/rwsem: No need to check for handoff bit if wait queue empty
lockdep: Fix -Wunused-parameter for _THIS_IP_
x86/mm: Force-inline __phys_addr_nodebug()
x86/kvm/svm: Force-inline GHCB accessors
task_stack, x86/cea: Force-inline stack helpers
|
||
|
|
143a6252e1 |
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas: - Initial support for the ARMv9 Scalable Matrix Extension (SME). SME takes the approach used for vectors in SVE and extends this to provide architectural support for matrix operations. No KVM support yet, SME is disabled in guests. - Support for crashkernel reservations above ZONE_DMA via the 'crashkernel=X,high' command line option. - btrfs search_ioctl() fix for live-lock with sub-page faults. - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring coherent I/O traffic, support for Arm's CMN-650 and CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup. - Kselftest updates for SME, BTI, MTE. - Automatic generation of the system register macros from a 'sysreg' file describing the register bitfields. - Update the type of the function argument holding the ESR_ELx register value to unsigned long to match the architecture register size (originally 32-bit but extended since ARMv8.0). - stacktrace cleanups. - ftrace cleanups. - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(), avoid executable mappings in kexec/hibernate code, drop TLB flushing from get_clear_flush() (and rename it to get_clear_contig()), ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits) arm64/sysreg: Generate definitions for FAR_ELx arm64/sysreg: Generate definitions for DACR32_EL2 arm64/sysreg: Generate definitions for CSSELR_EL1 arm64/sysreg: Generate definitions for CPACR_ELx arm64/sysreg: Generate definitions for CONTEXTIDR_ELx arm64/sysreg: Generate definitions for CLIDR_EL1 arm64/sve: Move sve_free() into SVE code section arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: kdump: Do not allocate crash low memory if not needed arm64/sve: Generate ZCR definitions arm64/sme: Generate defintions for SVCR arm64/sme: Generate SMPRI_EL1 definitions arm64/sme: Automatically generate SMPRIMAP_EL2 definitions arm64/sme: Automatically generate SMIDR_EL1 defines arm64/sme: Automatically generate defines for SMCR ... |
||
|
|
95fbef17e8 |
Merge tag 's390-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens: - Make use of the IBM z16 processor activity instrumentation facility to count cryptography operations: add a new PMU device driver so that perf can make use of this. - Add new IBM z16 extended counter set to cpumf support. - Add vdso randomization support. - Add missing KCSAN instrumentation to barriers and spinlocks, which should make s390's KCSAN support complete. - Add support for IPL-complete-control facility: notify the hypervisor that kexec finished work and the kernel starts. - Improve error logging for PCI. - Various small changes to workaround llvm's integrated assembler limitations, and one bug, to make it finally possible to compile the kernel with llvm's integrated assembler. This also requires to raise the minimum clang version to 14.0.0. - Various other small enhancements, bug fixes, and cleanups all over the place. * tag 's390-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (48 commits) s390/head: get rid of 31 bit leftovers scripts/min-tool-version.sh: raise minimum clang version to 14.0.0 for s390 s390/boot: do not emit debug info for assembly with llvm's IAS s390/boot: workaround llvm IAS bug s390/purgatory: workaround llvm's IAS limitations s390/entry: workaround llvm's IAS limitations s390/alternatives: remove padding generation code s390/alternatives: provide identical sized orginal/alternative sequences s390/cpumf: add new extended counter set for IBM z16 s390/preempt: disable __preempt_count_add() optimization for PROFILE_ALL_BRANCHES s390/stp: clock_delta should be signed s390/stp: fix todoff size s390/pai: add support for cryptography counters entry: Rename arch_check_user_regs() to arch_enter_from_user_mode() s390/compat: cleanup compat_linux.h header file s390/entry: remove broken and not needed code s390/boot: convert parmarea to C s390/boot: convert initial lowcore to C s390/ptrace: move short psw definitions to ptrace header file s390/head: initialize all new psws ... |
||
|
|
8443516da6 |
Merge tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"This includes some small changes to kernel/stop_machine.c and arch/x86
which are deps of the new Intel IFS support.
Highlights:
- New drivers:
- Intel "In Field Scan" (IFS) support
- Winmate FM07/FM07P buttons
- Mellanox SN2201 support
- AMD PMC driver enhancements
- Lots of various other small fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (54 commits)
platform/x86/intel/ifs: Add CPU_SUP_INTEL dependency
platform/x86: intel_cht_int33fe: Set driver data
platform/x86: intel-hid: fix _DSM function index handling
platform/x86: toshiba_acpi: use kobj_to_dev()
platform/x86: samsung-laptop: use kobj_to_dev()
platform/x86: gigabyte-wmi: Add support for Z490 AORUS ELITE AC and X570 AORUS ELITE WIFI
tools/power/x86/intel-speed-select: Fix warning for perf_cap.cpu
tools/power/x86/intel-speed-select: Display error on turbo mode disabled
Documentation: In-Field Scan
platform/x86/intel/ifs: add ABI documentation for IFS
trace: platform/x86/intel/ifs: Add trace point to track Intel IFS operations
platform/x86/intel/ifs: Add IFS sysfs interface
platform/x86/intel/ifs: Add scan test support
platform/x86/intel/ifs: Authenticate and copy to secured memory
platform/x86/intel/ifs: Check IFS Image sanity
platform/x86/intel/ifs: Read IFS firmware image
platform/x86/intel/ifs: Add stub driver for In-Field Scan
stop_machine: Add stop_core_cpuslocked() for per-core operations
x86/msr-index: Define INTEGRITY_CAPABILITIES MSR
x86/microcode/intel: Expose collect_cpu_info_early() for IFS
...
|
||
|
|
3e2cbc016b |
Merge tag 'x86_splitlock_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 splitlock updates from Borislav Petkov: - Add Raptor Lake to the set of CPU models which support splitlock - Make life miserable for apps using split locks by slowing them down considerably while the rest of the system remains responsive. The hope is it will hurt more and people will really fix their misaligned locks apps. As a result, free a TIF bit. * tag 'x86_splitlock_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Enable the split lock feature on Raptor Lake x86/split-lock: Remove unused TIF_SLD bit x86/split_lock: Make life miserable for split lockers |
||
|
|
de8ac81747 |
Merge tag 'x86_core_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core x86 updates from Borislav Petkov: - Remove all the code around GS switching on 32-bit now that it is not needed anymore - Other misc improvements * tag 'x86_core_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: bug: Use normal relative pointers in 'struct bug_entry' x86/nmi: Make register_nmi_handler() more robust x86/asm: Merge load_gs_index() x86/32: Remove lazy GS macros ELF: Remove elf_core_copy_kernel_regs() x86/32: Simplify ELF_CORE_COPY_REGS |
||
|
|
1de564b8c1 |
Merge tag 'x86_build_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 build updates from Borislav Petkov: - Add a "make x86_debug.config" target which enables a bunch of useful config debug options when trying to debug an issue - A gcc-12 build warnings fix * tag 'x86_build_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Wrap literal addresses in absolute_pointer() x86/configs: Add x86 debugging Kconfig fragment plus docs |
||
|
|
3a755ebcc2 |
Merge tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel TDX support from Borislav Petkov: "Intel Trust Domain Extensions (TDX) support. This is the Intel version of a confidential computing solution called Trust Domain Extensions (TDX). This series adds support to run the kernel as part of a TDX guest. It provides similar guest protections to AMD's SEV-SNP like guest memory and register state encryption, memory integrity protection and a lot more. Design-wise, it differs from AMD's solution considerably: it uses a software module which runs in a special CPU mode called (Secure Arbitration Mode) SEAM. As the name suggests, this module serves as sort of an arbiter which the confidential guest calls for services it needs during its lifetime. Just like AMD's SNP set, this series reworks and streamlines certain parts of x86 arch code so that this feature can be properly accomodated" * tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) x86/tdx: Fix RETs in TDX asm x86/tdx: Annotate a noreturn function x86/mm: Fix spacing within memory encryption features message x86/kaslr: Fix build warning in KASLR code in boot stub Documentation/x86: Document TDX kernel architecture ACPICA: Avoid cache flush inside virtual machines x86/tdx/ioapic: Add shared bit for IOAPIC base address x86/mm: Make DMA memory shared for TD guest x86/mm/cpa: Add support for TDX shared memory x86/tdx: Make pages shared in ioremap() x86/topology: Disable CPU online/offline control for TDX guests x86/boot: Avoid #VE during boot for TDX platforms x86/boot: Set CR0.NE early and keep it set during the boot x86/acpi/x86/boot: Add multiprocessor wake-up support x86/boot: Add a trampoline for booting APs via firmware handoff x86/tdx: Wire up KVM hypercalls x86/tdx: Port I/O: Add early boot support x86/tdx: Port I/O: Add runtime hypercalls x86/boot: Port I/O: Add decompression-time support for TDX x86/boot: Port I/O: Allow to hook up alternative helpers ... |
||
|
|
6e01f86fb2 |
Merge tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer and timekeeping updates from Thomas Gleixner: - Expose CLOCK_TAI to instrumentation to aid with TSN debugging. - Ensure that the clockevent is stopped when there is no timer armed to avoid pointless wakeups. - Make the sched clock frequency handling and rounding consistent. - Provide a better debugobject hint for delayed works. The timer callback is always the same, which makes it difficult to identify the underlying work. Use the work function as a hint instead. - Move the timer specific sysctl code into the timer subsystem. - The usual set of improvements and cleanups * tag 'timers-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers: Provide a better debugobjects hint for delayed works time/sched_clock: Fix formatting of frequency reporting code time/sched_clock: Use Hz as the unit for clock rate reporting below 4kHz time/sched_clock: Round the frequency reported to nearest rather than down timekeeping: Consolidate fast timekeeper timekeeping: Annotate ktime_get_boot_fast_ns() with data_race() timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped timekeeping: Introduce fast accessor to clock tai tracing/timer: Add missing argument documentation of trace points clocksource: Replace cpumask_weight() with cpumask_empty() timers: Move timer sysctl into the timer code clockevents: Use dedicated list iterator variable timers: Simplify calc_index() timers: Initialize base::next_expiry_recalc in timers_prepare_cpu() |
||
|
|
fcfde8a7cf |
Merge tag 'irq-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull interrupt handling updates from Thomas Gleixner:
"Core code:
- Make the managed interrupts more robust by shutting them down in
the core code when the assigned affinity mask does not contain
online CPUs.
- Make the irq simulator chip work on RT
- A small set of cpumask and power manageent cleanups
Drivers:
- A set of changes which mark GPIO interrupt chips immutable to
prevent the GPIO subsystem from modifying it under the hood. This
provides the necessary infrastructure and converts a set of GPIO
and pinctrl drivers over.
- A set of changes to make the pseudo-NMI handling for GICv3 more
robust: a missing barrier and consistent handling of the priority
mask.
- Another set of GICv3 improvements and fixes, but nothing
outstanding
- The usual set of improvements and cleanups all over the place
- No new irqchip drivers and not even a new device tree binding!
100+ interrupt chips are truly enough"
* tag 'irq-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
irqchip: Add Kconfig symbols for sunxi drivers
irqchip/gic-v3: Fix priority mask handling
irqchip/gic-v3: Refactor ISB + EOIR at ack time
irqchip/gic-v3: Ensure pseudo-NMIs have an ISB between ack and handling
genirq/irq_sim: Make the irq_work always run in hard irq context
irqchip/armada-370-xp: Do not touch Performance Counter Overflow on A375, A38x, A39x
irqchip/gic: Improved warning about incorrect type
irqchip/csky: Return true/false (not 1/0) from bool functions
irqchip/imx-irqsteer: Add runtime PM support
irqchip/imx-irqsteer: Constify irq_chip struct
irqchip/armada-370-xp: Enable MSI affinity configuration
irqchip/aspeed-scu-ic: Fix irq_of_parse_and_map() return value
irqchip/aspeed-i2c-ic: Fix irq_of_parse_and_map() return value
irqchip/sun6i-r: Use NULL for chip_data
irqchip/xtensa-mx: Fix initial IRQ affinity in non-SMP setup
irqchip/exiu: Fix acknowledgment of edge triggered interrupts
irqchip/gic-v3: Claim iomem resources
dt-bindings: interrupt-controller: arm,gic-v3: Make the v2 compat requirements explicit
irqchip/gic-v3: Relax polling of GIC{R,D}_CTLR.RWP
irqchip/gic-v3: Detect LPI invalidation MMIO registers
...
|
||
|
|
28c8f9fe94 |
Merge tag 'smp-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug updates from Thomas Gleixner: - Initialize the per-CPU structures during early boot so that the state is consistent from the very beginning. - Make the virtualization hotplug state handling more robust and let the core bringup CPUs which timed out in an earlier attempt again. - Make the x86/xen CPU state tracking consistent on a failed online attempt, so a consecutive bringup does not fall over the inconsistent state. * tag 'smp-core-2022-05-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Initialise all cpuhp_cpu_state structs earlier cpu/hotplug: Allow the CPU in CPU_UP_PREPARE state to be brought up again. x86/xen: Allow to retry if cpu_initialize_context() failed. |
||
|
|
115cd47132 |
Merge tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
"Here are the core block changes for 5.19. This contains:
- blk-throttle accounting fix (Laibin)
- Series removing redundant assignments (Michal)
- Expose bio cache via the bio_set, so that DM can use it (Mike)
- Finish off the bio allocation interface cleanups by dealing with
the weirdest member of the family. bio_kmalloc combines a kmalloc
for the bio and bio_vecs with a hidden bio_init call and magic
cleanup semantics (Christoph)
- Clean up the block layer API so that APIs consumed by file systems
are (almost) only struct block_device based, so that file systems
don't have to poke into block layer internals like the
request_queue (Christoph)
- Clean up the blk_execute_rq* API (Christoph)
- Clean up various lose end in the blk-cgroup code to make it easier
to follow in preparation of reworking the blkcg assignment for bios
(Christoph)
- Fix use-after-free issues in BFQ when processes with merged queues
get moved to different cgroups (Jan)
- BFQ fixes (Jan)
- Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming,
Wolfgang, me)"
* tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits)
blk-mq: fix typo in comment
bfq: Remove bfq_requeue_request_body()
bfq: Remove superfluous conversion from RQ_BIC()
bfq: Allow current waker to defend against a tentative one
bfq: Relax waker detection for shared queues
blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
blk-throttle: Set BIO_THROTTLED when bio has been throttled
blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
blk-cgroup: always terminate io.stat lines
block, bfq: make bfq_has_work() more accurate
block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
block: cleanup the VM accounting in submit_bio
block: Fix the bio.bi_opf comment
block: reorder the REQ_ flags
blk-iocost: combine local_stat and desc_stat to stat
block: improve the error message from bio_check_eod
block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
block: remove superfluous calls to blkcg_bio_issue_init
kthread: unexport kthread_blkcg
blk-cgroup: cleanup blkcg_maybe_throttle_current
...
|
||
|
|
3a166bdbf3 |
Merge tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"Here are the main io_uring changes for 5.19. This contains:
- Fixes for sparse type warnings (Christoph, Vasily)
- Support for multi-shot accept (Hao)
- Support for io_uring managed fixed files, rather than always
needing the applicationt o manage the indices (me)
- Fix for a spurious poll wakeup (Dylan)
- CQE overflow fixes (Dylan)
- Support more types of cancelations (me)
- Support for co-operative task_work signaling, rather than always
forcing an IPI (me)
- Support for doing poll first when appropriate, rather than always
attempting a transfer first (me)
- Provided buffer cleanups and support for mapped buffers (me)
- Improve how io_uring handles inflight SCM files (Pavel)
- Speedups for registered files (Pavel, me)
- Organize the completion data in a struct in io_kiocb rather than
keep it in separate spots (Pavel)
- task_work improvements (Pavel)
- Cleanup and optimize the submission path, in general and for
handling links (Pavel)
- Speedups for registered resource handling (Pavel)
- Support sparse buffers and file maps (Pavel, me)
- Various fixes and cleanups (Almog, Pavel, me)"
* tag 'for-5.19/io_uring-2022-05-22' of git://git.kernel.dk/linux-block: (111 commits)
io_uring: fix incorrect __kernel_rwf_t cast
io_uring: disallow mixed provided buffer group registrations
io_uring: initialize io_buffer_list head when shared ring is unregistered
io_uring: add fully sparse buffer registration
io_uring: use rcu_dereference in io_close
io_uring: consistently use the EPOLL* defines
io_uring: make apoll_events a __poll_t
io_uring: drop a spurious inline on a forward declaration
io_uring: don't use ERR_PTR for user pointers
io_uring: use a rwf_t for io_rw.flags
io_uring: add support for ring mapped supplied buffers
io_uring: add io_pin_pages() helper
io_uring: add buffer selection support to IORING_OP_NOP
io_uring: fix locking state for empty buffer group
io_uring: implement multishot mode for accept
io_uring: let fast poll support multishot
io_uring: add REQ_F_APOLL_MULTISHOT for requests
io_uring: add IORING_ACCEPT_MULTISHOT for accept
io_uring: only wake when the correct events are set
io_uring: avoid io-wq -EAGAIN looping for !IOPOLL
...
|
||
|
|
1e57930e9f |
Merge tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU update from Paul McKenney: - Documentation updates - Miscellaneous fixes - Callback-offloading updates, mainly simplifications - RCU-tasks updates, including some -rt fixups, handling of systems with sparse CPU numbering, and a fix for a boot-time race-condition failure - Put SRCU on a memory diet in order to reduce the size of the srcu_struct structure - Torture-test updates fixing some bugs in tests and closing some testing holes - Torture-test updates for the RCU tasks flavors, most notably ensuring that building rcutorture and friends does not change the RCU-tasks-related Kconfig options - Torture-test scripting updates - Expedited grace-period updates, most notably providing milliseconds-scale (not all that) soft real-time response from synchronize_rcu_expedited(). This is also the first time in almost 30 years of RCU that someone other than me has pushed for a reduction in the RCU CPU stall-warning timeout, in this case by more than three orders of magnitude from 21 seconds to 20 milliseconds. This tighter timeout applies only to expedited grace periods * tag 'rcu.2022.05.19a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (80 commits) rcu: Move expedited grace period (GP) work to RT kthread_worker rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT srcu: Drop needless initialization of sdp in srcu_gp_start() srcu: Prevent expedited GPs and blocking readers from consuming CPU srcu: Add contention check to call_srcu() srcu_data ->lock acquisition srcu: Automatically determine size-transition strategy at boot rcutorture: Make torture.sh allow for --kasan rcutorture: Make torture.sh refscale and rcuscale specify Tasks Trace RCU rcutorture: Make kvm.sh allow more memory for --kasan runs torture: Save "make allmodconfig" .config file scftorture: Remove extraneous "scf" from per_version_boot_params rcutorture: Adjust scenarios' Kconfig options for CONFIG_PREEMPT_DYNAMIC torture: Enable CSD-lock stall reports for scftorture torture: Skip vmlinux check for kvm-again.sh runs scftorture: Adjust for TASKS_RCU Kconfig option being selected rcuscale: Allow rcuscale without RCU Tasks Rude/Trace rcuscale: Allow rcuscale without RCU Tasks refscale: Allow refscale without RCU Tasks Rude/Trace refscale: Allow refscale without RCU Tasks rcutorture: Allow specifying per-scenario stat_interval ... |
||
|
|
16a23f394d |
Merge branches 'pm-em' and 'pm-cpuidle'
Marge Energy Model support updates and cpuidle updates for 5.19-rc1: - Update the Energy Model support code to allow the Energy Model to be artificial, which means that the power values may not be on a uniform scale with other devices providing power information, and update the cpufreq_cooling and devfreq_cooling thermal drivers to support artificial Energy Models (Lukasz Luba). - Make DTPM check the Energy Model type (Lukasz Luba). - Fix policy counter decrementation in cpufreq if Energy Model is in use (Pierre Gondois). - Add AlderLake processor support to the intel_idle driver (Zhang Rui). - Fix regression leading to no genpd governor in the PSCI cpuidle driver and fix the riscv-sbi cpuidle driver to allow a genpd governor to be used (Ulf Hansson). * pm-em: PM: EM: Decrement policy counter powercap: DTPM: Check for Energy Model type thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling Documentation: EM: Add artificial EM registration description PM: EM: Remove old debugfs files and print all 'flags' PM: EM: Change the order of arguments in the .active_power() callback PM: EM: Use the new .get_cost() callback while registering EM PM: EM: Add artificial EM flag PM: EM: Add .get_cost() callback * pm-cpuidle: cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used cpuidle: psci: Fix regression leading to no genpd governor intel_idle: Add AlderLake support |
||
|
|
95f2ce548a |
Merge branches 'pm-core', 'pm-sleep' and 'powercap'
Merge PM core changes, updates related to system sleep and power capping updates for 5.19-rc1: - Export dev_pm_ops instead of suspend() and resume() in the IIO chemical scd30 driver (Jonathan Cameron). - Add namespace variants of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and PM-runtime counterparts (Jonathan Cameron). - Move symbol exports in the IIO chemical scd30 driver into the IIO_SCD30 namespace (Jonathan Cameron). - Avoid device PM-runtime usage count underflows (Rafael Wysocki). - Allow dynamic debug to control printing of PM messages (David Cohen). - Fix some kernel-doc comments in hibernation code (Yang Li, Haowen Bai). - Preserve ACPI-table override during hibernation (Amadeusz Sławiński). - Improve support for suspend-to-RAM for PSCI OSI mode (Ulf Hansson). - Make Intel RAPL power capping driver support the RaptorLake and AlderLake N processors (Zhang Rui, Sumeet Pawnikar). - Remove redundant store to value after multiply in the RAPL power capping driver (Colin Ian King). * pm-core: PM: runtime: Avoid device usage count underflows iio: chemical: scd30: Move symbol exports into IIO_SCD30 namespace PM: core: Add NS varients of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and runtime pm equiv iio: chemical: scd30: Export dev_pm_ops instead of suspend() and resume() * pm-sleep: cpuidle: PSCI: Improve support for suspend-to-RAM for PSCI OSI mode PM: runtime: Allow to call __pm_runtime_set_status() from atomic context PM: hibernate: Don't mark comment as kernel-doc x86/ACPI: Preserve ACPI-table override during hibernation PM: hibernate: Fix some kernel-doc comments PM: sleep: enable dynamic debug support within pm_pr_dbg() PM: sleep: Narrow down -DDEBUG on kernel/power/ files * powercap: powercap: intel_rapl: remove redundant store to value after multiply powercap: intel_rapl: add support for ALDERLAKE_N powercap: RAPL: Add Power Limit4 support for RaptorLake powercap: intel_rapl: add support for RaptorLake |
||
|
|
3ac6487e58 |
perf: Fix sys_perf_event_open() race against self
Norbert reported that it's possible to race sys_perf_event_open() such
that the looser ends up in another context from the group leader,
triggering many WARNs.
The move_group case checks for races against itself, but the
!move_group case doesn't, seemingly relying on the previous
group_leader->ctx == ctx check. However, that check is racy due to not
holding any locks at that time.
Therefore, re-check the result after acquiring locks and bailing
if they no longer match.
Additionally, clarify the not_move_group case from the
move_group-vs-move_group race.
Fixes:
|
||
|
|
201729d53a |
Merge branches 'for-next/sme', 'for-next/stacktrace', 'for-next/fault-in-subpage', 'for-next/misc', 'for-next/ftrace' and 'for-next/crashkernel', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: perf/arm-cmn: Decode CAL devices properly in debugfs perf/arm-cmn: Fix filter_sel lookup perf/marvell_cn10k: Fix tad_pmu_event_init() to check pmu type first drivers/perf: hisi: Add Support for CPA PMU drivers/perf: hisi: Associate PMUs in SICL with CPUs online drivers/perf: arm_spe: Expose saturating counter to 16-bit perf/arm-cmn: Add CMN-700 support perf/arm-cmn: Refactor occupancy filter selector perf/arm-cmn: Add CMN-650 support dt-bindings: perf: arm-cmn: Add CMN-650 and CMN-700 perf: check return value of armpmu_request_irq() perf: RISC-V: Remove non-kernel-doc ** comments * for-next/sme: (30 commits) : Scalable Matrix Extensions support. arm64/sve: Move sve_free() into SVE code section arm64/sve: Make kernel FPU protection RT friendly arm64/sve: Delay freeing memory in fpsimd_flush_thread() arm64/sme: More sensibly define the size for the ZA register set arm64/sme: Fix NULL check after kzalloc arm64/sme: Add ID_AA64SMFR0_EL1 to __read_sysreg_by_encoding() arm64/sme: Provide Kconfig for SME KVM: arm64: Handle SME host state when running guests KVM: arm64: Trap SME usage in guest KVM: arm64: Hide SME system registers from guests arm64/sme: Save and restore streaming mode over EFI runtime calls arm64/sme: Disable streaming mode and ZA when flushing CPU state arm64/sme: Add ptrace support for ZA arm64/sme: Implement ptrace support for streaming mode SVE registers arm64/sme: Implement ZA signal handling arm64/sme: Implement streaming SVE signal handling arm64/sme: Disable ZA and streaming mode when handling signals arm64/sme: Implement traps and syscall handling for SME arm64/sme: Implement ZA context switching arm64/sme: Implement streaming SVE context switching ... * for-next/stacktrace: : Stacktrace cleanups. arm64: stacktrace: align with common naming arm64: stacktrace: rename stackframe to unwind_state arm64: stacktrace: rename unwinder functions arm64: stacktrace: make struct stackframe private to stacktrace.c arm64: stacktrace: delete PCS comment arm64: stacktrace: remove NULL task check from unwind_frame() * for-next/fault-in-subpage: : btrfs search_ioctl() live-lock fix using fault_in_subpage_writeable(). btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults arm64: Add support for user sub-page fault probing mm: Add fault_in_subpage_writeable() to probe at sub-page granularity * for-next/misc: : Miscellaneous patches. arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: mm: Make arch_faults_on_old_pte() check for migratability arm64: mte: Clean up user tag accessors arm64/hugetlb: Drop TLB flush from get_clear_flush() arm64: Declare non global symbols as static arm64: mm: Cleanup useless parameters in zone_sizes_init() arm64: fix types in copy_highpage() arm64: Set ARCH_NR_GPIO to 2048 for ARCH_APPLE arm64: cputype: Avoid overflow using MIDR_IMPLEMENTOR_MASK arm64: document the boot requirements for MTE arm64/mm: Compute PTRS_PER_[PMD|PUD] independently of PTRS_PER_PTE * for-next/ftrace: : ftrace cleanups. arm64/ftrace: Make function graph use ftrace directly ftrace: cleanup ftrace_graph_caller enable and disable * for-next/crashkernel: : Support for crashkernel reservations above ZONE_DMA. arm64: kdump: Do not allocate crash low memory if not needed docs: kdump: Update the crashkernel description for arm64 of: Support more than one crash kernel regions for kexec -s of: fdt: Add memory for devices by DT property "linux,usable-memory-range" arm64: kdump: Reimplement crashkernel=X arm64: Use insert_resource() to simplify code kdump: return -ENOENT if required cmdline option does not exist |
||
|
|
cdb4913293 |
Merge tag 'irqchip-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core
Pull irqchip updates from Marc Zyngier: - Add new infrastructure to stop gpiolib from rewriting irq_chip structures behind our back. Convert a few of them, but this will obviously be a long effort. - A bunch of GICv3 improvements, such as using MMIO-based invalidations when possible, and reducing the amount of polling we perform when reconfiguring interrupts. - Another set of GICv3 improvements for the Pseudo-NMI functionality, with a nice cleanup making it easy to reason about the various states we can be in when an NMI fires. - The usual bunch of misc fixes and minor improvements. Link: https://lore.kernel.org/all/20220519165308.998315-1-maz@kernel.org |
||
|
|
546a3fee17 |
sched: Reverse sched_class layout
Because GCC-12 is fully stupid about array bounds and it's just really hard to get a solid array definition from a linker script, flip the array order to avoid needing negative offsets :-/ This makes the whole relational pointer magic a little less obvious, but alas. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/YoOLLmLG7HRTXeEm@hirez.programming.kicks-ass.net |
||
|
|
8491d1bdf5 |
sched/clock: Use try_cmpxchg64 in sched_clock_{local,remote}
Use try_cmpxchg64 instead of cmpxchg64 (*ptr, old, new) != old in
sched_clock_{local,remote}. x86 cmpxchg returns success in ZF flag,
so this change saves a compare after cmpxchg (and related move
instruction in front of cmpxchg).
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220518184953.3446778-1-ubizjak@gmail.com
|
||
|
|
d4150779e6 |
random32: use real rng for non-deterministic randomness
random32.c has two random number generators in it: one that is meant to be used deterministically, with some predefined seed, and one that does the same exact thing as random.c, except does it poorly. The first one has some use cases. The second one no longer does and can be replaced with calls to random.c's proper random number generator. The relatively recent siphash-based bad random32.c code was added in response to concerns that the prior random32.c was too deterministic. Out of fears that random.c was (at the time) too slow, this code was anonymously contributed. Then out of that emerged a kind of shadow entropy gathering system, with its own tentacles throughout various net code, added willy nilly. Stop👏making👏bespoke👏random👏number👏generators👏. Fortunately, recent advances in random.c mean that we can stop playing with this sketchiness, and just use get_random_u32(), which is now fast enough. In micro benchmarks using RDPMC, I'm seeing the same median cycle count between the two functions, with the mean being _slightly_ higher due to batches refilling (which we can optimize further need be). However, when doing *real* benchmarks of the net functions that actually use these random numbers, the mean cycles actually *decreased* slightly (with the median still staying the same), likely because the additional prandom code means icache misses and complexity, whereas random.c is generally already being used by something else nearby. The biggest benefit of this is that there are many users of prandom who probably should be using cryptographically secure random numbers. This makes all of those accidental cases become secure by just flipping a switch. Later on, we can do a tree-wide cleanup to remove the static inline wrapper functions that this commit adds. There are also some low-ish hanging fruits for making this even faster in the future: a get_random_u16() function for use in the networking stack will give a 2x performance boost there, using SIMD for ChaCha20 will let us compute 4 or 8 or 16 blocks of output in parallel, instead of just one, giving us large buffers for cheap, and introducing a get_random_*_bh() function that assumes irqs are already disabled will shave off a few cycles for ordinary calls. These are things we can chip away at down the road. Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> |
||
|
|
69e9cd66ae |
audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: stable@vger.kernel.org
Fixes:
|
||
|
|
990e798d18 |
Merge tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner: "The recent expansion of the sched switch tracepoint inserted a new argument in the middle of the arguments. This reordering broke BPF programs which relied on the old argument list. While tracepoints are not considered stable ABI, it's not trivial to make BPF cope with such a change, but it's being worked on. For now restore the original argument order and move the new argument to the end of the argument list" * tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/tracing: Append prev_state to tp args instead |
||
|
|
fb756280f9 |
Merge tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner: "A single fix for a recent (introduced in 5.16) regression in the core interrupt code. The consolidation of the interrupt handler invocation code added an unconditional warning when generic_handle_domain_irq() is invoked from outside hard interrupt context. That's overbroad as the requirement for invoking these handlers in hard interrupt context is only required for certain interrupt types. The subsequently called code already contains a warning which triggers conditionally for interrupt chips which indicate this requirement in their properties. Remove the overbroad one" * tag 'irq-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq() |
||
|
|
21673fcb25 |
genirq/irq_sim: Make the irq_work always run in hard irq context
The IRQ simulator uses irq_work to trigger an interrupt. Without the IRQ_WORK_HARD_IRQ flag the irq_work will be performed in thread context on PREEMPT_RT. This causes locking errors later in handle_simple_irq() which expects to be invoked with disabled interrupts. Triggering individual interrupts in hardirq context should not lead to unexpected high latencies since this is also what the hardware controller does. Also it is used as a simulator so... Use IRQ_WORK_INIT_HARD() to carry out the irq_work in hardirq context on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/YnuZBoEVMGwKkLm+@linutronix.de |
||
|
|
317f29c14d |
timers: Provide a better debugobjects hint for delayed works
With debugobjects enabled the timer hint for freeing of active timers embedded inside delayed works is always the same, i.e. the hint is delayed_work_timer_fn, even though the function the delayed work is going to run can be wildly different depending on what work was queued. Enabling workqueue debugobjects doesn't help either because the delayed work isn't considered active until it is actually queued to run on a workqueue. If the work is freed while the timer is pending the work isn't considered active so there is no information from workqueue debugobjects. Special case delayed works in the timer debugobjects hint logic so that the delayed work function is returned instead of the delayed_work_timer_fn. This will help to understand which delayed work was pending that got freed. Apply the same treatment for kthread_delayed_work because it follows the same pattern. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Stephen Boyd <swboyd@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220511201951.42408-1-swboyd@chromium.org |
||
|
|
1366992e16 |
timekeeping: Add raw clock fallback for random_get_entropy()
The addition of random_get_entropy_fallback() provides access to whichever time source has the highest frequency, which is useful for gathering entropy on platforms without available cycle counters. It's not necessarily as good as being able to quickly access a cycle counter that the CPU has, but it's still something, even when it falls back to being jiffies-based. In the event that a given arch does not define get_cycles(), falling back to the get_cycles() default implementation that returns 0 is really not the best we can do. Instead, at least calling random_get_entropy_fallback() would be preferable, because that always needs to return _something_, even falling back to jiffies eventually. It's not as though random_get_entropy_fallback() is super high precision or guaranteed to be entropic, but basically anything that's not zero all the time is better than returning zero all the time. Finally, since random_get_entropy_fallback() is used during extremely early boot when randomizing freelists in mm_init(), it can be called before timekeeping has been initialized. In that case there really is nothing we can do; jiffies hasn't even started ticking yet. So just give up and return 0. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Theodore Ts'o <tytso@mit.edu> |
||
|
|
6829061315 |
futex: Remove a PREEMPT_RT_FULL reference.
Earlier the PREEMPT_RT patch had a PREEMPT_RT_FULL and PREEMPT_RT_BASE Kconfig option. The latter was a subset of the functionality that was enabled with PREEMPT_RT_FULL and was mainly useful for debugging. During the merging efforts the two Kconfig options were abandoned in the v5.4.3-rt1 release and since then there is only PREEMPT_RT which enables the full features set (as PREEMPT_RT_FULL did in earlier releases). Replace the PREEMPT_RT_FULL reference with PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: André Almeida <andrealmeid@igalia.com> Link: https://lore.kernel.org/r/YnvWUvq1vpqCfCU7@linutronix.de |
||
|
|
0ac824f379 |
Merge branch 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo: "Waiman's fix for a cgroup2 cpuset bug where it could miss nodes which were hot-added" * 'for-5.18-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp() |
||
|
|
2760f5a415 |
stop_machine: Add stop_core_cpuslocked() for per-core operations
Hardware core level testing features require near simultaneous execution of WRMSR instructions on all threads of a core to initiate a test. Provide a customized cut down version of stop_machine_cpuslocked() that just operates on the threads of a single core. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com> |
||
|
|
9c2136be08 |
sched/tracing: Append prev_state to tp args instead
Commit |
||
|
|
ce13389053 |
Merge branch 'exp.2022.05.11a' into HEAD
exp.2022.05.11a: Expedited-grace-period latency-reduction updates. |
||
|
|
9621fbee44 |
rcu: Move expedited grace period (GP) work to RT kthread_worker
Enabling CONFIG_RCU_BOOST did not reduce RCU expedited grace-period latency because its workqueues run at SCHED_OTHER, and thus can be delayed by normal processes. This commit avoids these delays by moving the expedited GP work items to a real-time-priority kthread_worker. This option is controlled by CONFIG_RCU_EXP_KTHREAD and disabled by default on PREEMPT_RT=y kernels which disable expedited grace periods after boot by unconditionally setting rcupdate.rcu_normal_after_boot=1. The results were evaluated on arm64 Android devices (6GB ram) running 5.10 kernel, and capturing trace data in critical user-level code. The table below shows the resulting order-of-magnitude improvements in synchronize_rcu_expedited() latency: ------------------------------------------------------------------------ | | workqueues | kthread_worker | Diff | ------------------------------------------------------------------------ | Count | 725 | 688 | | ------------------------------------------------------------------------ | Min Duration (ns) | 326 | 447 | 37.12% | ------------------------------------------------------------------------ | Q1 (ns) | 39,428 | 38,971 | -1.16% | ------------------------------------------------------------------------ | Q2 - Median (ns) | 98,225 | 69,743 | -29.00% | ------------------------------------------------------------------------ | Q3 (ns) | 342,122 | 126,638 | -62.98% | ------------------------------------------------------------------------ | Max Duration (ns) | 372,766,967 | 2,329,671 | -99.38% | ------------------------------------------------------------------------ | Avg Duration (ns) | 2,746,353 | 151,242 | -94.49% | ------------------------------------------------------------------------ | Standard Deviation (ns) | 19,327,765 | 294,408 | | ------------------------------------------------------------------------ The below table show the range of maximums/minimums for synchronize_rcu_expedited() latency from all experiments: ------------------------------------------------------------------------ | | workqueues | kthread_worker | Diff | ------------------------------------------------------------------------ | Total No. of Experiments | 25 | 23 | | ------------------------------------------------------------------------ | Largest Maximum (ns) | 372,766,967 | 2,329,671 | -99.38% | ------------------------------------------------------------------------ | Smallest Maximum (ns) | 38,819 | 86,954 | 124.00% | ------------------------------------------------------------------------ | Range of Maximums (ns) | 372,728,148 | 2,242,717 | | ------------------------------------------------------------------------ | Largest Minimum (ns) | 88,623 | 27,588 | -68.87% | ------------------------------------------------------------------------ | Smallest Minimum (ns) | 326 | 447 | 37.12% | ------------------------------------------------------------------------ | Range of Minimums (ns) | 88,297 | 27,141 | | ------------------------------------------------------------------------ Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Tejun Heo <tj@kernel.org> Reported-by: Tim Murray <timmurray@google.com> Reported-by: Wei Wang <wvw@google.com> Tested-by: Kyle Lin <kylelin@google.com> Tested-by: Chunwei Lu <chunweilu@google.com> Tested-by: Lulu Wang <luluw@google.com> Signed-off-by: Kalesh Singh <kaleshsingh@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
|
|
28b3ae4265 |
rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
Currently both expedited and regular grace period stall warnings use a single timeout value that with units of seconds. However, recent Android use cases problem require a sub-100-millisecond expedited RCU CPU stall warning. Given that expedited RCU grace periods normally complete in far less than a single millisecond, especially for small systems, this is not unreasonable. Therefore introduce the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT kernel configuration that defaults to 20 msec on Android and remains the same as that of the non-expedited stall warnings otherwise. It also can be changed in run-time via: /sys/.../parameters/rcu_exp_cpu_stall_timeout. [ paulmck: Default of zero to use CONFIG_RCU_STALL_TIMEOUT. ] Signed-off-by: Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> |
||
|
|
c9d8923bfb |
PM: EM: Decrement policy counter
In commit |
||
|
|
734387ec2f |
sched/deadline: Remove superfluous rq clock update in push_dl_task()
The change to call update_rq_clock() before activate_task() commit |
||
|
|
2679a83731 |
sched/core: Avoid obvious double update_rq_clock warning
When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.
Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other CPUs, the
rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.
So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
the WARN_DOUBLE_CLOCK warning.
For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().
For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.
Some call trace reports:
Call Trace 1:
<IRQ>
sched_rt_period_timer+0x10f/0x3a0
? enqueue_top_rt_rq+0x110/0x110
__hrtimer_run_queues+0x1a9/0x490
hrtimer_interrupt+0x10b/0x240
__sysvec_apic_timer_interrupt+0x8a/0x250
sysvec_apic_timer_interrupt+0x9a/0xd0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20
Call Trace 2:
<TASK>
activate_task+0x8b/0x110
push_rt_task.part.108+0x241/0x2c0
push_rt_tasks+0x15/0x30
finish_task_switch+0xaa/0x2e0
? __switch_to+0x134/0x420
__schedule+0x343/0x8e0
? hrtimer_start_range_ns+0x101/0x340
schedule+0x4e/0xb0
do_nanosleep+0x8e/0x160
hrtimer_nanosleep+0x89/0x120
? hrtimer_init_sleeper+0x90/0x90
__x64_sys_nanosleep+0x96/0xd0
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Call Trace 3:
<TASK>
deactivate_task+0x93/0xe0
pull_rt_task+0x33e/0x400
balance_rt+0x7e/0x90
__schedule+0x62f/0x8e0
do_task_dead+0x3f/0x50
do_exit+0x7b8/0xbb0
do_group_exit+0x2d/0x90
get_signal+0x9df/0x9e0
? preempt_count_add+0x56/0xa0
? __remove_hrtimer+0x35/0x70
arch_do_signal_or_restart+0x36/0x720
? nanosleep_copyout+0x39/0x50
? do_nanosleep+0x131/0x160
? audit_filter_inodes+0xf5/0x120
exit_to_user_mode_prepare+0x10f/0x1e0
syscall_exit_to_user_mode+0x17/0x30
do_syscall_64+0x40/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
Call Trace 4:
update_rq_clock+0x128/0x1a0
migrate_task_rq_dl+0xec/0x310
set_task_cpu+0x84/0x1e4
try_to_wake_up+0x1d8/0x5c0
wake_up_process+0x1c/0x30
hrtimer_wakeup+0x24/0x3c
__hrtimer_run_queues+0x114/0x270
hrtimer_interrupt+0xe8/0x244
arch_timer_handler_phys+0x30/0x50
handle_percpu_devid_irq+0x88/0x140
generic_handle_domain_irq+0x40/0x60
gic_handle_irq+0x48/0xe0
call_on_irq_stack+0x2c/0x60
do_interrupt_handler+0x80/0x84
Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.
Signed-off-by: Hao Jia <jiahao.os@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20220430085843.62939-2-jiahao.os@bytedance.com
|
||
|
|
47319846a9 |
Merge branch 'v5.18-rc5'
Obtain the new INTEL_FAM6 stuff required. Signed-off-by: Peter Zijlstra <peterz@infradead.org> |
||
|
|
434e09e757 |
locking/qrwlock: Change "queue rwlock" to "queued rwlock"
Queued rwlock was originally named "queue rwlock" which wasn't quite grammatically correct. However there are still some "queue rwlock" references in the code. Change those to "queued rwlock" for consistency. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220510192134.434753-1-longman@redhat.com |
||
|
|
792ea6a074 |
genirq: Remove WARN_ON_ONCE() in generic_handle_domain_irq()
Since commit |
||
|
|
6d97af487d |
entry: Rename arch_check_user_regs() to arch_enter_from_user_mode()
arch_check_user_regs() is used at the moment to verify that struct pt_regs contains valid values when entering the kernel from userspace. s390 needs a place in the generic entry code to modify a cpu data structure when switching from userspace to kernel mode. As arch_check_user_regs() is exactly this, rename it to arch_enter_from_user_mode(). When entering the kernel from userspace, arch_check_user_regs() is used to verify that struct pt_regs contains valid values. Note that the NMI codepath doesn't call this function. s390 needs a place in the generic entry code to modify a cpu data structure when switching from userspace to kernel mode. As arch_check_user_regs() is exactly this, rename it to arch_enter_from_user_mode(). Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lore.kernel.org/r/20220504062351.2954280-2-tmricht@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> |