mirror of
https://github.com/torvalds/linux.git
synced 2024-11-10 22:21:40 +00:00
2966bd3698
42714 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Petr Mladek
|
2966bd3698 | Merge branch 'rework/nbcon-base' into for-linus | ||
Petr Mladek
|
86098bcdde | Merge branch 'rework/misc-cleanups' into for-linus | ||
Petr Mladek
|
adb982ad4b | Merge branch 'for-6.7' into for-linus | ||
Li kunyu
|
3f6074cf46 |
printk: printk: Remove unnecessary statements'len = 0;'
In the following two functions, len has already been assigned a value of 0 when defining the variable, so remove 'len=0;'. Signed-off-by: Li kunyu <kunyu@nfschina.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20231023062359.130633-1-kunyu@nfschina.com |
||
Petr Mladek
|
29fda1ad2a |
printk: Reduce pr_flush() pooling time
pr_flush() does not guarantee that all messages would really get flushed
to the console. The best it could do is to wait with a given timeout.[*]
The current interval 100ms for checking the progress might seem too
long in some situations. For example, such delays are not appreciated
during suspend and resume especially when the consoles have been flushed
"long" time before the check.
On the other hand, the sleeping wait might be useful in other situations.
Especially, it would allow flushing the messages using printk kthreads
on the same CPU[*].
Use msleep(1) as a compromise.
Also measure the time using jiffies. msleep() does not guarantee
precise wakeup after the given delay. It might be much longer,
especially for times < 20s. See Documentation/timers/timers-howto.rst
for more details.
Note that msecs_to_jiffies() already translates a negative value into
an infinite timeout.
[*] console_unlock() does not guarantee flushing the consoles since
the commit
|
||
Petr Mladek
|
9277abd2c1 | Merge branch 'rework/misc-cleanups' into for-linus | ||
John Ogness
|
054c22bd78 |
printk: flush consoles before checking progress
Commit |
||
John Ogness
|
98a0465531 |
printk: fix illegal pbufs access for !CONFIG_PRINTK
When CONFIG_PRINTK is not set, PRINTK_MESSAGE_MAX is 0. This leads to a zero-sized array @outbuf in @printk_shared_pbufs. In console_flush_all() a pointer to the first element of the array is assigned with: char *outbuf = &printk_shared_pbufs.outbuf[0]; For !CONFIG_PRINTK this leads to a compiler warning: warning: array subscript 0 is outside array bounds of 'char[0]' [-Warray-bounds] This is not really dangerous because printk_get_next_message() always returns false for !CONFIG_PRINTK, which leads to @outbuf never being used. However, it makes no sense to even compile these functions for !CONFIG_PRINTK. Extend the existing '#ifdef CONFIG_PRINTK' block to contain the formatting and emitting functions since these have no purpose in !CONFIG_PRINTK. This also allows removing several more !CONFIG_PRINTK dummies as well as moving @suppress_panic_printk into a CONFIG_PRINTK block. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202309201724.M9BMAQIh-lkp@intel.com/ Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230920155238.670439-1-john.ogness@linutronix.de |
||
Thomas Gleixner
|
9757acd0a7 |
printk: nbcon: Allow drivers to mark unsafe regions and check state
For the write_atomic callback, the console driver may have unsafe regions that need to be appropriately marked. Provide functions that accept the nbcon_write_context struct to allow for the driver to enter and exit unsafe regions. Also provide a function for drivers to check if they are still the owner of the console. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-9-john.ogness@linutronix.de |
||
Thomas Gleixner
|
06653d57ff |
printk: nbcon: Add emit function and callback function for atomic printing
Implement an emit function for nbcon consoles to output printk messages. It utilizes the lockless printk_get_next_message() and console_prepend_dropped() functions to retrieve/build the output message. The emit function includes the required safety points to check for handover/takeover and calls a new write_atomic callback of the console driver to output the message. It also includes proper handling for updating the nbcon console sequence number. A new nbcon_write_context struct is introduced. This is provided to the write_atomic callback and includes only the information necessary for performing atomic writes. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-8-john.ogness@linutronix.de |
||
Thomas Gleixner
|
ad56ebd1d7 |
printk: nbcon: Add sequence handling
Add an atomic_long_t field @nbcon_seq to the console struct to store the sequence number for nbcon consoles. For nbcon consoles this will be used instead of the non-atomic @seq field. The new field allows for safe atomic sequence number updates without requiring any locking. On 64bit systems the new field stores the full sequence number. On 32bit systems the new field stores the lower 32 bits of the sequence number, which are expanded to 64bit as needed by folding the values based on the sequence numbers available in the ringbuffer. For 32bit systems, having a 32bit representation in the console is sufficient. If a console ever gets more than 2^31 records behind the ringbuffer then this is the least of the problems. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-7-john.ogness@linutronix.de |
||
Thomas Gleixner
|
4b08d9e24f |
printk: nbcon: Add ownership state functions
Provide functions that are related to the safe handover mechanism and allow console drivers to dynamically specify unsafe regions: - nbcon_context_can_proceed() Invoked by a console owner to check whether a handover request is pending or whether the console has been taken over by another context. If a handover request is pending, this function will also perform the handover, thus cancelling its own ownership. - nbcon_context_enter_unsafe()/nbcon_context_exit_unsafe() Invoked by a console owner to denote that the driver is about to enter or leave a critical region where a take over is unsafe. The exit variant is the point where the current owner releases the lock for a higher priority context which asked for the friendly handover. The unsafe state is stored in the console state and allows a new context to make informed decisions whether to attempt a takeover of such a console. The unsafe state is also available to the driver so that it can make informed decisions about the required actions and possibly take a special emergency path. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-6-john.ogness@linutronix.de |
||
Thomas Gleixner
|
5634c90fd8 |
printk: nbcon: Add buffer management
In case of hostile takeovers it must be ensured that the previous owner cannot scribble over the output buffer of the emergency/panic context. This is achieved by: - Adding a global output buffer instance for the panic context. This is the only situation where hostile takeovers can occur and there is always at most 1 panic context. - Allocating an output buffer per non-boot console upon console registration. This buffer is used by the console owner when not in panic context. (For boot consoles, the existing shared global legacy output buffer is used instead. Boot console printing will be synchronized with legacy console printing.) - Choosing the appropriate buffer is handled in the acquire/release functions. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-5-john.ogness@linutronix.de |
||
John Ogness
|
d818b56f77 |
printk: Make static printk buffers available to nbcon
The nbcon boot consoles also need printk buffers that are available very early. Since the nbcon boot consoles will also be serialized by the console_lock, they can use the same static printk buffers that the legacy consoles are using. Make the legacy static printk buffers available outside of printk.c so they can be used by nbcon.c. Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-4-john.ogness@linutronix.de |
||
Thomas Gleixner
|
3a5bb25162 |
printk: nbcon: Add acquire/release logic
Add per console acquire/release functionality. The state of the console is maintained in the "nbcon_state" atomic variable. The console is locked when: - The 'prio' field contains the priority of the context that owns the console. Only higher priority contexts are allowed to take over the lock. A value of 0 (NBCON_PRIO_NONE) means the console is not locked. - The 'cpu' field denotes on which CPU the console is locked. It is used to prevent busy waiting on the same CPU. Also it informs the lock owner that it has lost the lock in a more complex scenario when the lock was taken over by a higher priority context, released, and taken on another CPU with the same priority as the interrupted owner. The acquire mechanism uses a few more fields: - The 'req_prio' field is used by the handover approach to make the current owner aware that there is a context with a higher priority waiting for the friendly handover. - The 'unsafe' field allows to take over the console in a safe way in the middle of emitting a message. The field is set only when accessing some shared resources or when the console device is manipulated. It can be cleared, for example, after emitting one character when the console device is in a consistent state. - The 'unsafe_takeover' field is set when a hostile takeover took the console in an unsafe state. The console will stay in the unsafe state until re-initialized. The acquire mechanism uses three approaches: 1) Direct acquire when the console is not owned or is owned by a lower priority context and is in a safe state. 2) Friendly handover mechanism uses a request/grant handshake. It is used when the current owner has lower priority and the console is in an unsafe state. The requesting context: a) Sets its priority into the 'req_prio' field. b) Waits (with a timeout) for the owning context to unlock the console. c) Takes the lock and clears the 'req_prio' field. The owning context: a) Observes the 'req_prio' field set on exit from the unsafe console state. b) Gives up console ownership by clearing the 'prio' field. 3) Unsafe hostile takeover allows to take over the lock even when the console is an unsafe state. It is used only in panic() by the final attempt to flush consoles in a try and hope mode. Note that separate record buffers are used in panic(). As a result, the messages can be read and formatted without any risk even after using the hostile takeover in unsafe state. The release function simply clears the 'prio' field. All operations on @console::nbcon_state are atomic cmpxchg based to handle concurrency. The acquire/release functions implement only minimal policies: - Preference for higher priority contexts. - Protection of the panic CPU. All other policy decisions must be made at the call sites: - What is marked as an unsafe section. - Whether to spin-wait if there is already an owner and the console is in an unsafe state. - Whether to attempt an unsafe hostile takeover. The design allows to implement the well known: acquire() output_one_printk_record() release() The output of one printk record might be interrupted with a higher priority context. The new owner is supposed to reprint the entire interrupted record from scratch. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-3-john.ogness@linutronix.de |
||
Thomas Gleixner
|
6b93bb41f6 |
printk: Add non-BKL (nbcon) console basic infrastructure
The current console/printk subsystem is protected by a Big Kernel Lock, (aka console_lock) which has ill defined semantics and is more or less stateless. This puts severe limitations on the console subsystem and makes forced takeover and output in emergency and panic situations a fragile endeavour that is based on try and pray. The goal of non-BKL (nbcon) consoles is to break out of the console lock jail and to provide a new infrastructure that avoids the pitfalls and also allows console drivers to be gradually converted over. The proposed infrastructure aims for the following properties: - Per console locking instead of global locking - Per console state that allows to make informed decisions - Stateful handover and takeover As a first step, state is added to struct console. The per console state is an atomic_t using a 32bit bit field. Reserve state bits, which will be populated later in the series. Wire it up into the console register/unregister functionality. It was decided to use a bitfield because using a plain u32 with mask/shift operations resulted in uncomprehensible code. Co-developed-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230916192007.608398-2-john.ogness@linutronix.de |
||
Linus Torvalds
|
01a46efcd8 |
printk fixup for 6.6
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmT6xsgACgkQUqAMR0iA lPIpeA/+P8Jq25Hptd+Zh5XxkNO0lHKt8o9qZxgLwyA0Mh1YWLCTrmliqeK6jTWy CrKXkeyZmOQxzTFrflEN9OVQ/NnIBZBdiT6TWl64pI87pc+ElS+awC1BxS9gKDRC vhhwiMFWHt4DWlVhjKA+ARShYI/uI5+b/Ewlmg4ZeksWxWcFfnoZb0BCkQKGZcN+ 9G1C1mPtxV4lT1FJAglkgx3hF3+BcpX9EEVYdjdpQbD9J0jmlTP08/w+yyMvTzjs BVRPtcPeq0eb5iFp06SJKqa4377j6N9KMKtZG5IjzbkU/N6u7C4hXYxiGM/nOmyC 022/ZuFP6uwoeOiWBfuPJK9cadsomMbqeSJxC8wh/eLRqgTKU7N6wt8ybSSNynrC oMzdEI+ovjYIVrb13ZFDE+YFsXCzNhw1xNMmxdJMGQeeFNVkjK5MV1yaGdisffgj ps3eJbdaklFgU1m2GUkoVKglLeiYsihyvnSDZuxgfe+12GPReE0eTjvyj8gRUOd3 DLd1GPI7gmUv0c3k9VHyLu/ATrlBB9BvZ4cT2+anrC4Trf8Al2E8xgkzhYceqMOj 6XZFoGUGW+nhxU0vGy1psYCyw1k4L71vcT/WJD9ul+6RoHvwAnbDhmfu00W4LJ7C YwMheIc+00v4ofu/oXt32DhVFuAy06VdGiC4LYwWKkaPJELatPA= =bhor -----END PGP SIGNATURE----- Merge tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk fix from Petr Mladek: - Revert exporting symbols needed for dumping the raw printk buffer in panic(). I pushed the export prematurely before the user was ready for merging into the mainline. * tag 'printk-for-6.6-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: Revert "printk: export symbols for debug modules" |
||
Linus Torvalds
|
73be7fb14e |
Including fixes from netfilter and bpf.
Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmT6R6wACgkQMUZtbf5S IrsmTg//TgmRjxSZ0lrPQtJwZR/eN3ZR2oQG3rwnssCx+YgHEGGxQsfT4KHEMacR ZgGDZVTpthUJkkACBPi8ZMoy++RdjEmlCcanfeDkGHoYGtiX1lhkofhLMn1KUHbI rIbP9EdNKxQT0SsBlw/U28pD5jKyqOgL23QobEwmcjLTdMpamb+qIsD6/xNv9tEj Tu4BdCIkhjxnBD622hsE3pFTG7oSn2WM6rf5NT1E43mJ3W8RrMcydSB27J7Oryo9 l3nYMAhz0vQINS2WQ9eCT1/7GI6gg1nDtxFtrnV7ASvxayRBPIUr4kg1vT+Tixsz CZMnwVamEBIYl9agmj7vSji7d5nOUgXPhtWhwWUM2tRoGdeGw3vSi1pgDvRiUCHE PJ4UHv7goa2AgnOlOQCFtRybAu+9nmSGm7V+GkeGLnH7xbFsEa5smQ/+FSPJs8Dn Yf4q5QAhdN8tdnofRlrN/nCssoDF3cfmBsTJ7wo5h71gW+BWhsP58eDCJlXd/r8k +Qnvoe2kw27ktFR1tjsUDZ0AcSmeVARNwmXCOBYZsG4tEek8pLyj008mDvJvdfyn PGPn7Eo5DyaERlHVmPuebHXSyniDEPe2GLTmlHcGiRpGspoUHbB+HRiDAuRLMB9g pkL8RHpNfppnuUXeUoNy3rgEkYwlpTjZX0QHC6N8NQ76ccB6CNM= =YpmE -----END PGP SIGNATURE----- Merge tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking updates from Jakub Kicinski: "Including fixes from netfilter and bpf. Current release - regressions: - eth: stmmac: fix failure to probe without MAC interface specified Current release - new code bugs: - docs: netlink: fix missing classic_netlink doc reference Previous releases - regressions: - deal with integer overflows in kmalloc_reserve() - use sk_forward_alloc_get() in sk_get_meminfo() - bpf_sk_storage: fix the missing uncharge in sk_omem_alloc - fib: avoid warn splat in flow dissector after packet mangling - skb_segment: call zero copy functions before using skbuff frags - eth: sfc: check for zero length in EF10 RX prefix Previous releases - always broken: - af_unix: fix msg_controllen test in scm_pidfd_recv() for MSG_CMSG_COMPAT - xsk: fix xsk_build_skb() dereferencing possible ERR_PTR() - netfilter: - nft_exthdr: fix non-linear header modification - xt_u32, xt_sctp: validate user space input - nftables: exthdr: fix 4-byte stack OOB write - nfnetlink_osf: avoid OOB read - one more fix for the garbage collection work from last release - igmp: limit igmpv3_newpack() packet size to IP_MAX_MTU - bpf, sockmap: fix preempt_rt splat when using raw_spin_lock_t - handshake: fix null-deref in handshake_nl_done_doit() - ip: ignore dst hint for multipath routes to ensure packets are hashed across the nexthops - phy: micrel: - correct bit assignments for cable test errata - disable EEE according to the KSZ9477 errata Misc: - docs/bpf: document compile-once-run-everywhere (CO-RE) relocations - Revert "net: macsec: preserve ingress frame ordering", it appears to have been developed against an older kernel, problem doesn't exist upstream" * tag 'net-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: enetc: distinguish error from valid pointers in enetc_fixup_clear_rss_rfs() Revert "net: team: do not use dynamic lockdep key" net: hns3: remove GSO partial feature bit net: hns3: fix the port information display when sfp is absent net: hns3: fix invalid mutex between tc qdisc and dcb ets command issue net: hns3: fix debugfs concurrency issue between kfree buffer and read net: hns3: fix byte order conversion issue in hclge_dbg_fd_tcam_read() net: hns3: Support query tx timeout threshold by debugfs net: hns3: fix tx timeout issue net: phy: Provide Module 4 KSZ9477 errata (DS80000754C) netfilter: nf_tables: Unbreak audit log reset netfilter: ipset: add the missing IP_SET_HASH_WITH_NET0 macro for ip_set_hash_netportnet.c netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction netfilter: nf_tables: uapi: Describe NFTA_RULE_CHAIN_ID netfilter: nfnetlink_osf: avoid OOB read netfilter: nftables: exthdr: fix 4-byte stack OOB write selftests/bpf: Check bpf_sk_storage has uncharged sk_omem_alloc bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc bpf: bpf_sk_storage: Fix invalid wait context lockdep report s390/bpf: Pass through tail call counter in trampolines ... |
||
Christoph Hellwig
|
4952801fc6 |
Revert "printk: export symbols for debug modules"
This reverts commit
|
||
Martin KaFai Lau
|
55d49f750b |
bpf: bpf_sk_storage: Fix the missing uncharge in sk_omem_alloc
The commit |
||
Martin KaFai Lau
|
a96a44aba5 |
bpf: bpf_sk_storage: Fix invalid wait context lockdep report
'./test_progs -t test_local_storage' reported a splat:
[ 27.137569] =============================
[ 27.138122] [ BUG: Invalid wait context ]
[ 27.138650] 6.5.0-03980-gd11ae1b16b0a #247 Tainted: G O
[ 27.139542] -----------------------------
[ 27.140106] test_progs/1729 is trying to lock:
[ 27.140713] ffff8883ef047b88 (stock_lock){-.-.}-{3:3}, at: local_lock_acquire+0x9/0x130
[ 27.141834] other info that might help us debug this:
[ 27.142437] context-{5:5}
[ 27.142856] 2 locks held by test_progs/1729:
[ 27.143352] #0: ffffffff84bcd9c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x4/0x40
[ 27.144492] #1: ffff888107deb2c0 (&storage->lock){..-.}-{2:2}, at: bpf_local_storage_update+0x39e/0x8e0
[ 27.145855] stack backtrace:
[ 27.146274] CPU: 0 PID: 1729 Comm: test_progs Tainted: G O 6.5.0-03980-gd11ae1b16b0a #247
[ 27.147550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 27.149127] Call Trace:
[ 27.149490] <TASK>
[ 27.149867] dump_stack_lvl+0x130/0x1d0
[ 27.152609] dump_stack+0x14/0x20
[ 27.153131] __lock_acquire+0x1657/0x2220
[ 27.153677] lock_acquire+0x1b8/0x510
[ 27.157908] local_lock_acquire+0x29/0x130
[ 27.159048] obj_cgroup_charge+0xf4/0x3c0
[ 27.160794] slab_pre_alloc_hook+0x28e/0x2b0
[ 27.161931] __kmem_cache_alloc_node+0x51/0x210
[ 27.163557] __kmalloc+0xaa/0x210
[ 27.164593] bpf_map_kzalloc+0xbc/0x170
[ 27.165147] bpf_selem_alloc+0x130/0x510
[ 27.166295] bpf_local_storage_update+0x5aa/0x8e0
[ 27.167042] bpf_fd_sk_storage_update_elem+0xdb/0x1a0
[ 27.169199] bpf_map_update_value+0x415/0x4f0
[ 27.169871] map_update_elem+0x413/0x550
[ 27.170330] __sys_bpf+0x5e9/0x640
[ 27.174065] __x64_sys_bpf+0x80/0x90
[ 27.174568] do_syscall_64+0x48/0xa0
[ 27.175201] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 27.175932] RIP: 0033:0x7effb40e41ad
[ 27.176357] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d8
[ 27.179028] RSP: 002b:00007ffe64c21fc8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[ 27.180088] RAX: ffffffffffffffda RBX: 00007ffe64c22768 RCX: 00007effb40e41ad
[ 27.181082] RDX: 0000000000000020 RSI: 00007ffe64c22008 RDI: 0000000000000002
[ 27.182030] RBP: 00007ffe64c21ff0 R08: 0000000000000000 R09: 00007ffe64c22788
[ 27.183038] R10: 0000000000000064 R11: 0000000000000202 R12: 0000000000000000
[ 27.184006] R13: 00007ffe64c22788 R14: 00007effb42a1000 R15: 0000000000000000
[ 27.184958] </TASK>
It complains about acquiring a local_lock while holding a raw_spin_lock.
It means it should not allocate memory while holding a raw_spin_lock
since it is not safe for RT.
raw_spin_lock is needed because bpf_local_storage supports tracing
context. In particular for task local storage, it is easy to
get a "current" task PTR_TO_BTF_ID in tracing bpf prog.
However, task (and cgroup) local storage has already been moved to
bpf mem allocator which can be used after raw_spin_lock.
The splat is for the sk storage. For sk (and inode) storage,
it has not been moved to bpf mem allocator. Using raw_spin_lock or not,
kzalloc(GFP_ATOMIC) could theoretically be unsafe in tracing context.
However, the local storage helper requires a verifier accepted
sk pointer (PTR_TO_BTF_ID), it is hypothetical if that (mean running
a bpf prog in a kzalloc unsafe context and also able to hold a verifier
accepted sk pointer) could happen.
This patch avoids kzalloc after raw_spin_lock to silent the splat.
There is an existing kzalloc before the raw_spin_lock. At that point,
a kzalloc is very likely required because a lookup has just been done
before. Thus, this patch always does the kzalloc before acquiring
the raw_spin_lock and remove the later kzalloc usage after the
raw_spin_lock. After this change, it will have a charge and then
uncharge during the syscall bpf_map_update_elem() code path.
This patch opts for simplicity and not continue the old
optimization to save one charge and uncharge.
This issue is dated back to the very first commit of bpf_sk_storage
which had been refactored multiple times to create task, inode, and
cgroup storage. This patch uses a Fixes tag with a more recent
commit that should be easier to do backport.
Fixes:
|
||
Sebastian Andrzej Siewior
|
6764e767f4 |
bpf: Assign bpf_tramp_run_ctx::saved_run_ctx before recursion check.
__bpf_prog_enter_recur() assigns bpf_tramp_run_ctx::saved_run_ctx before
performing the recursion check which means in case of a recursion
__bpf_prog_exit_recur() uses the previously set bpf_tramp_run_ctx::saved_run_ctx
value.
__bpf_prog_enter_sleepable_recur() assigns bpf_tramp_run_ctx::saved_run_ctx
after the recursion check which means in case of a recursion
__bpf_prog_exit_sleepable_recur() uses an uninitialized value. This does not
look right. If I read the entry trampoline code right, then bpf_tramp_run_ctx
isn't initialized upfront.
Align __bpf_prog_enter_sleepable_recur() with __bpf_prog_enter_recur() and
set bpf_tramp_run_ctx::saved_run_ctx before the recursion check is made.
Remove the assignment of saved_run_ctx in kern_sys_bpf() since it happens
a few cycles later.
Fixes:
|
||
Sebastian Andrzej Siewior
|
7645629f7d |
bpf: Invoke __bpf_prog_exit_sleepable_recur() on recursion in kern_sys_bpf().
If __bpf_prog_enter_sleepable_recur() detects recursion then it returns
0 without undoing rcu_read_lock_trace(), migrate_disable() or
decrementing the recursion counter. This is fine in the JIT case because
the JIT code will jump in the 0 case to the end and invoke the matching
exit trampoline (__bpf_prog_exit_sleepable_recur()).
This is not the case in kern_sys_bpf() which returns directly to the
caller with an error code.
Add __bpf_prog_exit_sleepable_recur() as clean up in the recursion case.
Fixes:
|
||
Linus Torvalds
|
61401a8724 |
Kbuild updates for v6.6
- Enable -Wenum-conversion warning option - Refactor the rpm-pkg target - Fix scripts/setlocalversion to consider annotated tags for rt-kernel - Add a jump key feature for the search menu of 'make nconfig' - Support Qt6 for 'make xconfig' - Enable -Wformat-overflow, -Wformat-truncation, -Wstringop-overflow, and -Wrestrict warnings for W=1 builds - Replace <asm/export.h> with <linux/export.h> for alpha, ia64, and sparc - Support DEB_BUILD_OPTIONS=parallel=N for the debian source package - Refactor scripts/Makefile.modinst and fix some modules_sign issues - Add a new Kconfig env variable to warn symbols that are not defined anywhere - Show help messages of config fragments in 'make help' -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmT3X/oVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsG58oQAIXDrka3r53Flky/uJjSl8ab620o XL3u4PF/ekv6qsZoLlU24WQP8BzcJO6gPHFz88mE9/J1+wHpNKZLZehjpgj1cCY3 LatbEAa3DCZPC/c7P/nz+FT4mjTZpKOeQmvZVfA+xonBHmTyVUKgws0uDB/xuTjE GARyOX7ymD0AAZv84SUUCiaBe5Y2Bkrki67HfteS4bxW8GHg0rZWzrFUUkEkoG54 elNOYR0WYROwyo8Iokd2MedVdK2SPZxvY8i67hXl2K+Qve6tLNk8dbRIENnYI0pk 7oQVmIfC20eu9CteywHlyjt8jpTOeIrRc2yhJKR0YrjjIzKhulRGMh+pFAAwoySd Se60uWCS2AydcXWTrtb+iwFUyM2zRK4SaMlxleqnoE/bWYp6jhg9qbV9xpztWSYI j39k9aX7B19stN1drzJeyXdILRVtaAQCcax3RR+mGgm4Z5fuTDntPepvIv8J3lBg QZ4MCdOdtFw33eQaKa7O3LddD3q1X355xeaIITivEe3rAk5iIJYu3Ty1VY+/XTcH ktSVl83zQ5Ge3tvx8D6kCR9J8jAQyTLIKHxvr/j969HgZKguS2i37eChnPyKcu23 ZMKJcmCJ1O7naQXVrb/TeiaMR0UEo/PSdrUjpEO3LlMpRthNXLVSLfgJGv8WLO7/ pb/HFXHgKaSORiRV =lfUi -----END PGP SIGNATURE----- Merge tag 'kbuild-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Enable -Wenum-conversion warning option - Refactor the rpm-pkg target - Fix scripts/setlocalversion to consider annotated tags for rt-kernel - Add a jump key feature for the search menu of 'make nconfig' - Support Qt6 for 'make xconfig' - Enable -Wformat-overflow, -Wformat-truncation, -Wstringop-overflow, and -Wrestrict warnings for W=1 builds - Replace <asm/export.h> with <linux/export.h> for alpha, ia64, and sparc - Support DEB_BUILD_OPTIONS=parallel=N for the debian source package - Refactor scripts/Makefile.modinst and fix some modules_sign issues - Add a new Kconfig env variable to warn symbols that are not defined anywhere - Show help messages of config fragments in 'make help' * tag 'kbuild-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (62 commits) kconfig: fix possible buffer overflow kbuild: Show marked Kconfig fragments in "help" kconfig: add warn-unknown-symbols sanity check kbuild: dummy-tools: make MPROFILE_KERNEL checks work on BE Documentation/llvm: refresh docs modpost: Skip .llvm.call-graph-profile section check kbuild: support modules_sign for external modules as well kbuild: support 'make modules_sign' with CONFIG_MODULE_SIG_ALL=n kbuild: move more module installation code to scripts/Makefile.modinst kbuild: reduce the number of mkdir calls during modules_install kbuild: remove $(MODLIB)/source symlink kbuild: move depmod rule to scripts/Makefile.modinst kbuild: add modules_sign to no-{compiler,sync-config}-targets kbuild: do not run depmod for 'make modules_sign' kbuild: deb-pkg: support DEB_BUILD_OPTIONS=parallel=N in debian/rules alpha: remove <asm/export.h> alpha: replace #include <asm/export.h> with #include <linux/export.h> ia64: remove <asm/export.h> ia64: replace #include <asm/export.h> with #include <linux/export.h> sparc: remove <asm/export.h> ... |
||
Linus Torvalds
|
3c31041e37 |
printk changes for 6.6
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmT1pbAACgkQUqAMR0iA lPLoxBAAl18gKo6C8zIBNBoYNl7FxvChrJORjK7RQONs5RYKt8drHjSrJGazhjiV TIdbZt9juqs+UT/f6DnkJznrqQ0W70fQsgUpw+q7n7+cnkIoXAiAs+plexdQXigB 6nx67b2oub41jTwzn/uV7R/eTwq2VnoZqudS/o9iAI/Ia9JzkqmGx08hQedvOoqX 2SWs140iY/zXsFUyEfe8UTXwJUnC/n9pwtpe5sLpmtyupGc/KumUimTQ+LFJbV9o n8QhcQn10CE93M5fH/R2JXjZO7wuSmCHt/V8oSnoOwdCBBe7Tc6aBx5wUwc4XCuC 450h5hlYBKq97lA1PnWGC01uAkeDTRw8963LVRRqWvohoFuHXR0cisF9FTM9LXfs bg90XjzYAK7Ns9fJ0dZHOSbUtRaa06hiExUnO3ctyv2G6h8qKfv86LCuP0CMFmQU rflfk1dPiMW20HT3ZJNtMCy3Vu6kVQSdSaGKTnwzTcUWop5tCQxhmWYBXH6q/1LH aD7xT1xJnBGqLUqq5C8twoOea+w47x/vtjTLi7mJarP5Wfh8xv6axdkwE8N4NrYp cc2RR83a1BZ7At3YkAjfjHmhaZ97gSSe6+Yqk9UzvUEQY/WILEGnb0DKO1jCSB34 D2NPh1MHF5MFQjazdt/dSyMJVxDlTeY/S5wqfLLhNZts48LJ8n0= =D7ZU -----END PGP SIGNATURE----- Merge tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux Pull printk updates from Petr Mladek: - Do not try to get the console lock when it is not need or useful in panic() - Replace the global console_suspended state by a per-console flag - Export symbols needed for dumping the raw printk buffer in panic() - Fix documentation of printf formats for integer types - Moved Sergey Senozhatsky to the reviewer role - Misc cleanups * tag 'printk-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: printk: export symbols for debug modules lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix() printk: ringbuffer: Fix truncating buffer size min_t cast printk: Rename abandon_console_lock_in_panic() to other_cpu_in_panic() printk: Add per-console suspended state printk: Consolidate console deferred printing printk: Do not take console lock for console_flush_on_panic() printk: Keep non-panic-CPUs out of console lock printk: Reduce console_unblank() usage in unsafe scenarios kdb: Do not assume write() callback available docs: printk-formats: Treat char as always unsigned docs: printk-formats: Fix hex printing of signed values MAINTAINERS: adjust printk/vsprintf entries |
||
Petr Mladek
|
f0f6923953 | Merge branch 'rework/misc-cleanups' into for-linus | ||
Kees Cook
|
feec5e1f74 |
kbuild: Show marked Kconfig fragments in "help"
Currently the Kconfig fragments in kernel/configs and arch/*/configs that aren't used internally aren't discoverable through "make help", which consists of hard-coded lists of config fragments. Instead, list all the fragment targets that have a "# Help: " comment prefix so the targets can be generated dynamically. Add logic to the Makefile to search for and display the fragment and comment. Add comments to fragments that are intended to be direct targets. Signed-off-by: Kees Cook <keescook@chromium.org> Co-developed-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Reviewed-by: Nicolas Schier <nicolas@fjasle.eu> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> |
||
Linus Torvalds
|
b70100f2e6 |
Probes updates for v6.6:
- kprobes: use struct_size() for variable size kretprobe_instance data structure. - eprobe: Simplify trace_eprobe list iteration. - probe events: Data structure field access support on BTF argument. . Update BTF argument support on the functions in the kernel loadable modules (only loaded modules are supported). . Move generic BTF access function (search function prototype and get function parameters) to a separated file. . Add a function to search a member of data structure in BTF. . Support accessing BTF data structure member from probe args by C-like arrow('->') and dot('.') operators. e.g. 't sched_switch next=next->pid vruntime=next->se.vruntime' . Support accessing BTF data structure member from $retval. e.g. 'f getname_flags%return +0($retval->name):string' . Add string type checking if BTF type info is available. This will reject if user specify ":string" type for non "char pointer" type. . Automatically assume the fprobe event as a function return event if $retval is used. - selftests/ftrace: Add BTF data field access test cases. - Documentation: Update fprobe event example with BTF data field. -----BEGIN PGP SIGNATURE----- iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmTycQkbHG1hc2FtaS5o aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bqS8H/jeR1JhOzIXOvTw7XCFm MrSY/SKi8tQfV6lau2UmoYdbYvYjpqL34XLOQPNf2/lrcL2M9aNYXk9fbhlW8enx vkMyKQ0E5anixkF4vsTbEl9DaprxbpsPVACmZ/7VjQk2JuXIdyaNk8hno9LgIcEq udztb0o2HmDFqAXfRi0LvlSTAIwvXZ+usmEvYpaq1g2WwrCe7NHEYl42vMpj+h4H 9l4t5rA9JyPPX4yQUjtKGW5eRVTwDTm/Gn6DRzYfYzkkiBZv27qfovzBOt672LgG hyot+u7XeKvZx3jjnF7+mRWoH/m0dqyhyi/nPhpIE09VhgwclrbGAcDuR1x6sp01 PHY= =hBDN -----END PGP SIGNATURE----- Merge tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes updates from Masami Hiramatsu: - kprobes: use struct_size() for variable size kretprobe_instance data structure. - eprobe: Simplify trace_eprobe list iteration. - probe events: Data structure field access support on BTF argument. - Update BTF argument support on the functions in the kernel loadable modules (only loaded modules are supported). - Move generic BTF access function (search function prototype and get function parameters) to a separated file. - Add a function to search a member of data structure in BTF. - Support accessing BTF data structure member from probe args by C-like arrow('->') and dot('.') operators. e.g. 't sched_switch next=next->pid vruntime=next->se.vruntime' - Support accessing BTF data structure member from $retval. e.g. 'f getname_flags%return +0($retval->name):string' - Add string type checking if BTF type info is available. This will reject if user specify ":string" type for non "char pointer" type. - Automatically assume the fprobe event as a function return event if $retval is used. - selftests/ftrace: Add BTF data field access test cases. - Documentation: Update fprobe event example with BTF data field. * tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: Documentation: tracing: Update fprobe event example with BTF field selftests/ftrace: Add BTF fields access testcases tracing/fprobe-event: Assume fprobe is a return event by $retval tracing/probes: Add string type check with BTF tracing/probes: Support BTF field access from $retval tracing/probes: Support BTF based data structure field access tracing/probes: Add a function to search a member of a struct/union tracing/probes: Move finding func-proto API and getting func-param API to trace_btf tracing/probes: Support BTF argument on module functions tracing/eprobe: Iterate trace_eprobe directly kernel: kprobes: Use struct_size() |
||
Linus Torvalds
|
e021c5f1f6 |
Tracing fixes and clean ups for 6.6:
- Replace strlcpy() with strscpy() - Initialize the pipe cpumask to zero on allocation - Use within_module() instead of open coding it - Remove extra space in hwlat_detectory/mode output - Use LIST_HEAD() instead of open coding it - A bunch of clean ups and fixes for the cpumask filter - Set local da_mon_##name to static - Fix race in snapshot buffer between cpu write and swap -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZPMsBhQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qiToAP49yXVK6seGUwU18QSp4mCNa0QNSH0v bl2UYVSCPv8aNQEAquDOvGInbMcT2z69lK359TVlGPrtVjhqFDloSLMYgAo= =DTGo -----END PGP SIGNATURE----- Merge tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull more tracing updates from Steven Rostedt: "Tracing fixes and clean ups: - Replace strlcpy() with strscpy() - Initialize the pipe cpumask to zero on allocation - Use within_module() instead of open coding it - Remove extra space in hwlat_detectory/mode output - Use LIST_HEAD() instead of open coding it - A bunch of clean ups and fixes for the cpumask filter - Set local da_mon_##name to static - Fix race in snapshot buffer between cpu write and swap" * tag 'trace-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/filters: Fix coding style issues tracing/filters: Change parse_pred() cpulist ternary into an if block tracing/filters: Fix double-free of struct filter_pred.mask tracing/filters: Fix error-handling of cpulist parsing buffer tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY ftrace: Use LIST_HEAD to initialize clear_hash ftrace: Use within_module to check rec->ip within specified module. tracing: Replace strlcpy with strscpy in trace/events/task.h tracing: Fix race issue between cpu buffer write and swap tracing: Remove extra space at the end of hwlat_detector/mode rv: Set variable 'da_mon_##name' to static |
||
Linus Torvalds
|
a6216978de |
Fix false positive "softirq work is pending" messages on -rt
kernels, caused by a buggy factoring-out of existing code. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzC5oRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1hG1Q/+ICGbpxdQOrVg7QTLzgsxttxIyi4Un6lb vPX8NO9/4HIxObR6bd+ji2499TIO6nIhRqGOzEYUe9jzEN27eM/bMo6kCcRkbWra 4V/GZd3j+XdJwIQR442cBdUcByk4X7FlE7KqizJIbvYYyLBXzboBcpOdH012e2O9 UzFjtU+pk5Lhit18jL6/AvjsMhneKb6YUH20Wbb6zjZ1FL28YGKpeOHrh6GSXlKE GVS07pWSAB8TMXdO+8YaKoE7VIOdMaYS/mJJ6u/M8Wo+Kl0wWwmJtjmSYzvD2Uod PGcCiGXr1QpWK66wZNnXjs3rb6bX5umCo8rc5L6rqvWTYvB8Owpl5V94+87yGEov 29lYvWdVJ7dPqP8fSQfYxBKbgfINwOO1STYnIX1Q5mDD9fK2SgOpD9+JFagYnJoI 5n6KoVArVHQXSB4odTn+Qyt0yu0iDubUFRxBTrWijq5ooHOExaxByl0ViyCfp1aS csTcGQSJsvHKhZPejDggjp74IU/ge5lUN4uSFlPVo3jYFwUIIgBG+43QtFiVrplg 3ifpI2qNISQl65PRerZjB5jBmItUGnUl71tnEg/Cli7zvvw/nMeKh98vChtE9S3A 2eQ66rrV9eJAeYaNCV4Uz1UmocD4i2Vec9tZOUUoIga/bDIOVr+bxUr7nvcOneak 98h2ylU4W8o= =zpfn -----END PGP SIGNATURE----- Merge tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix false positive 'softirq work is pending' messages on -rt kernels, caused by a buggy factoring-out of existing code" * tag 'timers-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tick/rcu: Fix false positive "softirq work is pending" messages |
||
Linus Torvalds
|
23dfeae882 |
Fix a CPU hotplug related deadlock between the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzCnoRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1h9gQ//fOQT0OrAUwhAdW7IZcQGETdSykxYqRXT OciNPVUirJOXJM7tG08OEUUAiRrbIALHBRkfNk/ycOWTfa2qsur3jyGgyi8cnKo9 NmdNltRMZ2UbKlJxzoeu1wIqWkmoLaYloVp3YWXgPClclNbBROCvXvEHnEr1iRtA trfEjNxEYgKeDkJROg0Av3RQTzLgZ3TqZ67mzJVZbCbz9i/IxicJa4PNuzrkw3c3 q42Btx+Ru1ikl/Jww0asX4iESFxuUk3Aw7DBX7slaLMrLcPMKsbO2D3npSxLFTCP TUdMKoIanVjl5+a2//kT8TkV+M1OKvczy6AYH0pV/yZLkAQqJmLphVsEI6rMIdp2 ep26hrjaLlhp3dTr8jNQ86BlxT6zqP1/+OpC4BbKFK2HLJj7sGKcb5W5WMdhB/Qh tA+CgVZXJDHkH2m2zD6o+SDm5JvbbHOLywfBBUSggHDDq3oOrxdjS2g8tgFwtnJ2 ZxjvJ4Ot3M26b44qkQbJeG42Q7ciLDrfaOZhlZ6bt30agU4EP3bg4dZAL24EoPLY zdom++puL+nUBr6EvzbboVxisuf0cvDbujmuFRQdntRRy8oHgiQVhb+b4EWh0oOc CKN06nyA9z5MzhAek3/GuxMYKEWM9/Dy6rDyqvaxfcbc9PIaxGfRxjgpKxrdRPOu rjGsQHZbTlo= =wM0O -----END PGP SIGNATURE----- Merge tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull CPU hotplug fix from Ingo Molnar: "Fix a CPU hotplug related deadlock between the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer" * tag 'smp-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: cpu/hotplug: Prevent self deadlock on CPU hot-unplug |
||
Linus Torvalds
|
c39cbc5b60 |
Miscellaneous scheduler fixes: a reporting fix, a static symbol fix,
and a kernel-doc fix. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTzCWcRHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1ifZQ//SvKEhKT1lolh4bmMZAaRHWJBq8omH1V+ 36k5Jd3AOJcIEVJD0h+6yfJH2mlS6ZGW3te33VhW5z4c2dMBms90qMLv6xdr/E7j Pseud3bc6o9SHPA8v9oNKy9GTcnD/kKXxr7f8tabJxxewzUY7EkHa4lJ1AgOIzDP njWIVqqVFqoO1QjjKCN1ERuMU6ifX+6bcSik89f9F3Gg8KhUMbmv2+O6Jd22wwWC mI/atl2EdkJg0VlFNIZtVk6n+hwbBaPfkd76ihQ/82MaLo1M7PilO5mtpgUNUCMh XLlekYwFewUJP+xGkTg1FG8A2B937EXpPdO/8F4vFU/PhDeev8fIG99MIOo3h6A4 nlaKU/Lh9NFT/64wfP5/b8ud/UEf/7YhD1SH2SdtWwT2yXTrYUl2kdKYpgE8TX3C c7Ap0vKQIcRrycoOaoxsKw915jeA5zCyykd75RLfzmK2phW22QtZgdIOuiflDeds LAuelYaY6C7ZRPnGn2iWceoWS3IBhXTo4nsfh6sPX3A057iHo7CFjX7u1DeMqcuh XIoKOgjZR/vnJQaFdWTSKKbzwTweAc1BBDUYy4CxWbUMD13GIE2trCS+GBWTZcoF KaASIdXL4nUHP35rX9hlww5GUhF6NNOTZ9mkN7NHYfoVy0WXt/rLCywqo3D6Bne+ jeTHwFKjJYI= =jDS4 -----END PGP SIGNATURE----- Merge tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Miscellaneous scheduler fixes: a reporting fix, a static symbol fix, and a kernel-doc fix" * tag 'sched-urgent-2023-09-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Report correct state for TASK_IDLE | TASK_FREEZABLE sched/fair: Make update_entity_lag() static sched/core: Add kernel-doc for set_cpus_allowed_ptr() |
||
Linus Torvalds
|
76be05d4fd |
cgroup: fix build when CGROUP_SCHED is not enabled
Sudip Mukherjee reports that the mips sb1250_swarm_defconfig build fails
with the current kernel. It isn't actually MIPS-specific, it's just
that that defconfig does not have CGROUP_SCHED enabled like most configs
do, and as such shows this error:
kernel/cgroup/cgroup.c: In function 'cgroup_local_stat_show':
kernel/cgroup/cgroup.c:3699:15: error: implicit declaration of function 'cgroup_tryget_css'; did you mean 'cgroup_tryget'? [-Werror=implicit-function-declaration]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^~~~~~~~~~~~~~~~~
| cgroup_tryget
kernel/cgroup/cgroup.c:3699:13: warning: assignment to 'struct cgroup_subsys_state *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
3699 | css = cgroup_tryget_css(cgrp, ss);
| ^
because cgroup_tryget_css() only exists when CGROUP_SCHED is enabled,
and the cgroup_local_stat_show() function should similarly be guarded by
that config option.
Move things around a bit to fix this all.
Fixes:
|
||
Valentin Schneider
|
cbb557ba92 |
tracing/filters: Fix coding style issues
Recent commits have introduced some coding style issues, fix those up. Link: https://lkml.kernel.org/r/20230901151039.125186-5-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Valentin Schneider
|
2900bcbee3 |
tracing/filters: Change parse_pred() cpulist ternary into an if block
Review comments noted that an if block would be clearer than a ternary, so swap it out. No change in behaviour intended Link: https://lkml.kernel.org/r/20230901151039.125186-4-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Valentin Schneider
|
1caf7adb9e |
tracing/filters: Fix double-free of struct filter_pred.mask
When a cpulist filter is found to contain a single CPU, that CPU is saved as a scalar and the backing cpumask storage is freed. Also NULL the mask to avoid a double-free once we get down to free_predicate(). Link: https://lkml.kernel.org/r/20230901151039.125186-3-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Valentin Schneider
|
9af4058493 |
tracing/filters: Fix error-handling of cpulist parsing buffer
parse_pred() allocates a string buffer to parse the user-provided cpulist, but doesn't check the allocation result nor does it free the buffer once it is no longer needed. Add an allocation check, and free the buffer as soon as it is no longer needed. Link: https://lkml.kernel.org/r/20230901151039.125186-2-vschneid@redhat.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Brian Foster
|
3d07fa1dd1 |
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:
# cat /sys/kernel/debug/tracing/trace_pipe
cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy
Zero the allocation of pipe_cpumask to avoid the problem.
Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com
Cc: stable@vger.kernel.org
Fixes:
|
||
Ruan Jinjie
|
2a30dbcbef |
ftrace: Use LIST_HEAD to initialize clear_hash
Use LIST_HEAD() to initialize clear_hash instead of open-coding it. Link: https://lore.kernel.org/linux-trace-kernel/20230809071551.913041-1-ruanjinjie@huawei.com Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Levi Yun
|
1351148904 |
ftrace: Use within_module to check rec->ip within specified module.
within_module_core && within_module_init condition is same to within module but it's more readable. Use within_module instead of former condition to check rec->ip within specified module area or not. Link: https://lore.kernel.org/linux-trace-kernel/20230803205236.32201-1-ppbuk5246@gmail.com Signed-off-by: Levi Yun <ppbuk5246@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
||
Zheng Yejian
|
3163f635b2 |
tracing: Fix race issue between cpu buffer write and swap
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes:
|
||
Mikhail Kobuk
|
2cf0dee989 |
tracing: Remove extra space at the end of hwlat_detector/mode
Space is printed after each mode value including the last one:
$ echo \"$(sudo cat /sys/kernel/tracing/hwlat_detector/mode)\"
"none [round-robin] per-cpu "
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lore.kernel.org/linux-trace-kernel/20230825103432.7750-1-m.kobuk@ispras.ru
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
|
||
Linus Torvalds
|
34232fcfe9 |
Tracing updates for 6.6:
User visible changes: - Added a way to easier filter with cpumasks: # echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter - Show actual size of ring buffer after modifying the ring buffer size via buffer_size_kb. Currently it just returns what was written, but the actual size rounds up to the sub buffer size. Show that real size instead. Major changes: - Added "eventfs". This is the code that handles the inodes and dentries of tracefs/events directory. As there are thousands of events, and each event has several inodes and dentries that currently exist even when tracing is never used, they take up precious memory. Instead, eventfs will allocate the inodes and dentries in a JIT way (similar to what procfs does). There is now metadata that handles the events and subdirectories, and will create the inodes and dentries when they are used. Note, I also have patches that remove the subdirectory meta data, but will wait till the next merge window before applying them. It's a little more complex, and I want to make sure the dynamic code works properly before adding more complexity, making it easier to revert if need be. Minor changes: - Optimization to user event list traversal. - Remove intermediate permission of tracefs files (note the intermediate permission removes all access to the files so it is not a security concern, but just a clean up.) - Add the complex fix to FORTIFY_SOURCE to the kernel stack event logic. - Other minor clean ups. -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEXtmkj8VMCiLR0IBM68Js21pW3nMFAmTwtAsUHHJvc3RlZHRA Z29vZG1pcy5vcmcACgkQ68Js21pW3nNOXRAAsslQT6alY4OeplC4x47+V6+6NiIA oDtOmWAqf7TsH9bukzRFD36rUly42O20RJDx9z0Q3iRc3vGxEawId8z6P0HmBwRb VSl5BryWvL5Wc5w94xS8EeCuC1MRfhVDyfbtVFmWigzfvd/f+hp71ViMPHUvrRJX KhzzNSBc4ir5E1lzfwa7meYTXzDwrQlZbYfdf5aH94IWAkqDj85PUZDJ7UmLZhXG CIglSpNFXZ0j19Wo/U6KZlHR1XfunBKungCzJ5Dbznc9YLWZTQXOIZF4YPKfPIJL ulRG9chwXY0nQWhG3xM1UHZLsAMSWw5i13a4ZN4d8FCNOgv8ttcJnfDk7ZYUS0Oz RmY1dGcSRKAZTUTjm8ZBtmyiUCc9kZAIk0fyEfIHtoDYXmhnvni3wuTnbRSdXaSi q4YkxPaLfX8Fn3QloCqqddt8iONu7BnbpZOhUCl2AtBib52gnTTF7+rQ6/0D3rjo SSuvEHhnjJhzk+3jM2odxjmTAztNT+yu6FbKXZUKPt1Kj9YHv1J9cEQw9/Etw+GV 8jQBe979D8hFJmDOJOT/O/TdPqE9mQoMNBt6Y8QnE4nbJWM+i/MBrThFpUSQhRCr 0Ya/HgR2QyRH7RmZW5o2H9mNtN+V9c7RxZW8erYzRbUs0YofK2OpGi9SrPzxWCke w6j0VVZHaxdPguM= =/s+e -----END PGP SIGNATURE----- Merge tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: "User visible changes: - Added a way to easier filter with cpumasks: # echo 'cpumask & CPUS{17-42}' > /sys/kernel/tracing/events/ipi_send_cpumask/filter - Show actual size of ring buffer after modifying the ring buffer size via buffer_size_kb. Currently it just returns what was written, but the actual size rounds up to the sub buffer size. Show that real size instead. Major changes: - Added "eventfs". This is the code that handles the inodes and dentries of tracefs/events directory. As there are thousands of events, and each event has several inodes and dentries that currently exist even when tracing is never used, they take up precious memory. Instead, eventfs will allocate the inodes and dentries in a JIT way (similar to what procfs does). There is now metadata that handles the events and subdirectories, and will create the inodes and dentries when they are used. Note, I also have patches that remove the subdirectory meta data, but will wait till the next merge window before applying them. It's a little more complex, and I want to make sure the dynamic code works properly before adding more complexity, making it easier to revert if need be. Minor changes: - Optimization to user event list traversal - Remove intermediate permission of tracefs files (note the intermediate permission removes all access to the files so it is not a security concern, but just a clean up) - Add the complex fix to FORTIFY_SOURCE to the kernel stack event logic - Other minor cleanups" * tag 'trace-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits) tracefs: Remove kerneldoc from struct eventfs_file tracefs: Avoid changing i_mode to a temp value tracing/user_events: Optimize safe list traversals ftrace: Remove empty declaration ftrace_enable_daemon() and ftrace_disable_daemon() tracing: Remove unused function declarations tracing/filters: Document cpumask filtering tracing/filters: Further optimise scalar vs cpumask comparison tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU tracing/filters: Enable filtering the CPU common field by a cpumask tracing/filters: Enable filtering a scalar field by a cpumask tracing/filters: Enable filtering a cpumask field by another cpumask tracing/filters: Dynamically allocate filter_pred.regex test: ftrace: Fix kprobe test for eventfs eventfs: Move tracing/events to eventfs eventfs: Implement removal of meta data from eventfs eventfs: Implement functions to create files and dirs when accessed eventfs: Implement eventfs lookup, read, open functions eventfs: Implement eventfs file add functions ... |
||
Linus Torvalds
|
bd30fe6a7d |
workqueue: Changes for v6.6
* Unbound workqueues now support more flexible affinity scopes. The default behavior is to soft-affine according to last level cache boundaries. A work item queued from a given LLC is executed by a worker running on the same LLC but the worker may be moved across cache boundaries as the scheduler sees fit. On machines which multiple L3 caches, which are becoming more popular along with chiplet designs, this improves cache locality while not harming work conservation too much. Unbound workqueues are now also a lot more flexible in terms of execution affinity. Differeing levels of affinity scopes are supported and both the default and per-workqueue affinity settings can be modified dynamically. This should help working around amny of sub-optimal behaviors observed recently with asymmetric ARM CPUs. This involved signficant restructuring of workqueue code. Nothing was reported yet but there's some risk of subtle regressions. Should keep an eye out. * Rescuer workers now has more identifiable comms. * workqueue.unbound_cpus added so that CPUs which can be used by workqueue can be constrained early during boot. * Now that all the in-tree users have been flushed out, trigger warning if system-wide workqueues are flushed. * One pull commit from for-6.5-fixes to avoid cascading conflicts in the affinity scope patchset. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZPERlQ4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGVqQAPwIOy9tWY5jFAmMuIyH6wV50hbmfxCc2n5xhQNr 5HoyGgEA8lw1W7afDCIPiQVA7AYsu8dhwuNSOcRCJxhrrn4XsA0= =g/Uu -----END PGP SIGNATURE----- Merge tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue updates from Tejun Heo: - Unbound workqueues now support more flexible affinity scopes. The default behavior is to soft-affine according to last level cache boundaries. A work item queued from a given LLC is executed by a worker running on the same LLC but the worker may be moved across cache boundaries as the scheduler sees fit. On machines which multiple L3 caches, which are becoming more popular along with chiplet designs, this improves cache locality while not harming work conservation too much. Unbound workqueues are now also a lot more flexible in terms of execution affinity. Differeing levels of affinity scopes are supported and both the default and per-workqueue affinity settings can be modified dynamically. This should help working around amny of sub-optimal behaviors observed recently with asymmetric ARM CPUs. This involved signficant restructuring of workqueue code. Nothing was reported yet but there's some risk of subtle regressions. Should keep an eye out. - Rescuer workers now has more identifiable comms. - workqueue.unbound_cpus added so that CPUs which can be used by workqueue can be constrained early during boot. - Now that all the in-tree users have been flushed out, trigger warning if system-wide workqueues are flushed. * tag 'wq-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (31 commits) workqueue: fix data race with the pwq->stats[] increment workqueue: Rename rescuer kworker workqueue: Make default affinity_scope dynamically updatable workqueue: Add "Affinity Scopes and Performance" section to documentation workqueue: Implement non-strict affinity scope for unbound workqueues workqueue: Add workqueue_attrs->__pod_cpumask workqueue: Factor out need_more_worker() check and worker wake-up workqueue: Factor out work to worker assignment and collision handling workqueue: Add multiple affinity scopes and interface to select them workqueue: Modularize wq_pod_type initialization workqueue: Add tools/workqueue/wq_dump.py which prints out workqueue configuration workqueue: Generalize unbound CPU pods workqueue: Factor out clearing of workqueue-only attrs fields workqueue: Factor out actual cpumask calculation to reduce subtlety in wq_update_pod() workqueue: Initialize unbound CPU pods later in the boot workqueue: Move wq_pod_init() below workqueue_init() workqueue: Rename NUMA related names to use pod instead workqueue: Rename workqueue_attrs->no_numa to ->ordered workqueue: Make unbound workqueues to use per-cpu pool_workqueues workqueue: Call wq_update_unbound_numa() on all CPUs in NUMA node on CPU hotplug ... |
||
Linus Torvalds
|
7716f383a5 |
cgroup: Changes for v6.6
* Per-cpu cpu usage stats are now tracked. This currently isn't printed out
in the cgroupfs interface and can only be accessed through e.g. BPF.
Should decide on a not-too-ugly way to show per-cpu stats in cgroupfs.
* cpuset received some cleanups and prepatory patches for the pending
cpus.exclusive patchset which will allow cpuset partitions to be created
below non-partition parents, which should ease the management of partition
cpusets.
* A lot of code and documentation cleanup patches.
* tools/testing/selftests/cgroup/test_cpuset.c is added. This causes trivial
conflicts in .gitignore and Makefile under the directory against
|
||
Linus Torvalds
|
e987af4546 |
percpu: changes for v6.6
percpu * A couple cleanups by Baoquan He and Bibo Mao. The only behavior change is to start printing messages if we're under the warn limit for failed atomic allocations. percpu_counter * Shakeel introduced percpu counters into mm_struct which caused percpu allocations be on the hot path [1]. Originally I spent some time trying to improve the percpu allocator, but instead preferred what Mateusz Guzik proposed grouping at the allocation site, percpu_counter_init_many(). This allows a single percpu allocation to be shared by the counters. I like this approach because it creates a shared lifetime by the allocations. Additionally, I believe many inits have higher level synchronization requirements, like percpu_counter does against HOTPLUG_CPU. Therefore we can group these optimizations together. [1] https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/ -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE3hZPHJdcVwe+yTTtiDc0yuoFPR0FAmTv2IUACgkQiDc0yuoF PR0+gg//U430Y9jRSKQtbh3dEPaAeWGcTfSTnVHbQGfBj3A4ePJyWl/Tgzri31AC rzr8SRs0yX8b82TbECWsV67i/GrntLJyz4yQ52S/RRqVwnQqSn/wicEdCY00lJBt Tye8zApOnYBouaYqIOxm/M7ofvKzJ3gWOVeF/zBwM6hwvNaXXtY5r86fSDxoEbhY HOFnCDmg5Spf0U50j1G7nV5KfAb7BNA3/HFyzfzH+w+OWi4IGbThsfrg1qvjyFot KlEK/kF8Af2xj2A2se4XFsLc2D/Tj+29juYVQqIPBJzVPrZ2uerKSszK5Zcr+Use kMiG7tRWKE+2vkOM1RQ5Y5NCVEBhlXlienz1gf/C7247SEGs6OIyqvyDAgPTRx6p oR2/vx9hMtaSMf4aHWd+fYS5gNZ05iMvOIbRZnI1wZkQglQVkJvXhzuLaJ+dIGSP ypv6XOepik7vDjZ3p3xJXd0TAn4NSkn3jWRetrymdtMFanF99qw1VqjmkLecSil0 Gr0UhRL1oiMde6niVJrOpdOGLwt/M4N99Y5rksw6NCnktRJ99coFGj7LglZGMsu+ YkOyjD8MVJXTkBtBNGeqHTKe6nyVkHFq9ad5EmWjPkefP5JziH8i18k7JlF1dLA5 c8peq3ES659D5f0mU2jilD9PsCsBfSn6Of4ruMZa2Zr1XDD8snI= =vcA1 -----END PGP SIGNATURE----- Merge tag 'percpu-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu Pull percpu updates from Dennis Zhou: "One bigger change to percpu_counter's api allowing for init and destroy of multiple counters via percpu_counter_init_many() and percpu_counter_destroy_many(). This is used to help begin remediating a performance regression with percpu rss stats. Additionally, it seems larger core count machines are feeling the burden of the single threaded allocation of percpu. Mateusz is thinking about it and I will spend some time on it too. percpu: - A couple cleanups by Baoquan He and Bibo Mao. The only behavior change is to start printing messages if we're under the warn limit for failed atomic allocations. percpu_counter: - Shakeel introduced percpu counters into mm_struct which caused percpu allocations be on the hot path [1]. Originally I spent some time trying to improve the percpu allocator, but instead preferred what Mateusz Guzik proposed grouping at the allocation site, percpu_counter_init_many(). This allows a single percpu allocation to be shared by the counters. I like this approach because it creates a shared lifetime by the allocations. Additionally, I believe many inits have higher level synchronization requirements, like percpu_counter does against HOTPLUG_CPU. Therefore we can group these optimizations together" Link: https://lore.kernel.org/linux-mm/20221024052841.3291983-1-shakeelb@google.com/ [1] * tag 'percpu-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu: kernel/fork: group allocation/free of per-cpu counters for mm struct pcpcntr: add group allocation/free mm/percpu.c: print error message too if atomic alloc failed mm/percpu.c: optimize the code in pcpu_setup_first_chunk() a little bit mm/percpu.c: remove redundant check mm/percpu: Remove some local variables in pcpu_populate_pte |
||
Linus Torvalds
|
8e1e49550d |
TTY/Serial driver changes for 6.6-rc1
Here is the big set of tty and serial driver changes for 6.6-rc1. Lots of cleanups in here this cycle, and some driver updates. Short summary is: - Jiri's continued work to make the tty code and apis be a bit more sane with regards to modern kernel coding style and types - cpm_uart driver updates - n_gsm updates and fixes - meson driver updates - sc16is7xx driver updates - 8250 driver updates for different hardware types - qcom-geni driver fixes - tegra serial driver change - stm32 driver updates - synclink_gt driver cleanups - tty structure size reduction All of these have been in linux-next this week with no reported issues. The last bit of cleanups from Jiri and the tty structure size reduction came in last week, a bit late but as they were just style changes and size reductions, I figured they should get into this merge cycle so that others can work on top of them with no merge conflicts. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZPH+jA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ykKyACgldt6QeenTN+6dXIHS/eQHtTKZwMAn3arSeXI QrUUnLFjOWyoX87tbMBQ =LVw0 -----END PGP SIGNATURE----- Merge tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver updates from Greg KH: "Here is the big set of tty and serial driver changes for 6.6-rc1. Lots of cleanups in here this cycle, and some driver updates. Short summary is: - Jiri's continued work to make the tty code and apis be a bit more sane with regards to modern kernel coding style and types - cpm_uart driver updates - n_gsm updates and fixes - meson driver updates - sc16is7xx driver updates - 8250 driver updates for different hardware types - qcom-geni driver fixes - tegra serial driver change - stm32 driver updates - synclink_gt driver cleanups - tty structure size reduction All of these have been in linux-next this week with no reported issues. The last bit of cleanups from Jiri and the tty structure size reduction came in last week, a bit late but as they were just style changes and size reductions, I figured they should get into this merge cycle so that others can work on top of them with no merge conflicts" * tag 'tty-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits) tty: shrink the size of struct tty_struct by 40 bytes tty: n_tty: deduplicate copy code in n_tty_receive_buf_real_raw() tty: n_tty: extract ECHO_OP processing to a separate function tty: n_tty: unify counts to size_t tty: n_tty: use u8 for chars and flags tty: n_tty: simplify chars_in_buffer() tty: n_tty: remove unsigned char casts from character constants tty: n_tty: move newline handling to a separate function tty: n_tty: move canon handling to a separate function tty: n_tty: use MASK() for masking out size bits tty: n_tty: make n_tty_data::num_overrun unsigned tty: n_tty: use time_is_before_jiffies() in n_tty_receive_overrun() tty: n_tty: use 'num' for writes' counts tty: n_tty: use output character directly tty: n_tty: make flow of n_tty_receive_buf_common() a bool Revert "tty: serial: meson: Add a earlycon for the T7 SoC" Documentation: devices.txt: Fix minors for ttyCPM* Documentation: devices.txt: Remove ttySIOC* Documentation: devices.txt: Remove ttyIOC* serial: 8250_bcm7271: improve bcm7271 8250 port ... |
||
Linus Torvalds
|
4ad0a4c234 |
powerpc updates for 6.6
- Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system. - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit. - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks. - Add support for the "nohlt" command line option to disable CPU idle. - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1. - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory. - Various other small features and fixes. Thanks to: Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, Zheng Zengkai. -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmTwgbwTHG1wZUBlbGxl cm1hbi5pZC5hdQAKCRBR6+o8yOGlgFmpD/432vipeoqvkAYsyK0xi/Y3GcY0wcyd WJApLXXadEbtKQrgXQ6sowWqalg5thYnQCRarg/tXKK/po3KfgwkPjGDpOL+cIdr 12QVN2XJm9VmJ1wYJxzk+yXx4F43AdmMdr94qWAGufbTHezwb4UpzVR1NxtFrOE/ X5TNsC2+2mdZY/ZaNHS5vsTIFv3EhQfqgjZPlIAdLn6CGc8xWT514Q/uHA8+ytM/ HL7Hqs33DoPSvgTa5TT/2E0d0k5nO3P5KObzAjpYlireTPaBi51mpKGewcrtm0o2 v3cBlbfx3C7pe9ZhKBK9BH8cjynfiqsVZ9/lCw/7eBNdm9tHuzG0jeS7Db9tCZXS fM7G2R7SoIusPTqxlBmkU5DpYslwrHiVgCyy3ijxkoA/fakVwh/GgTcMsRt73IY6 n6DsUvWwuYHCIeIiHmHQJqCqCRtV+aMzU3AbbBHOjtdIanhlW16M686dEsgCirh7 akRVRD5VqKaqXs34PpkRL89Xv3wZRjl6XZ3hZFfCjSYXfpXDXhgSToIskpHYhKL8 gpY7WtG9YQP05Xz5HRCx6EluaZVeKe0lZi6fezX7Mi9AygJQO8FfXqP1mHBlEq40 ThWtvL9D89RV6lADqqFN20XepgvKNOyAXcE4szvsnIZYUSPmZQZSPxx+DHtROaLP jX3ifxtxJp92pQ== =5g7K -----END PGP SIGNATURE----- Merge tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Add HOTPLUG_SMT support (/sys/devices/system/cpu/smt) and honour the configured SMT state when hotplugging CPUs into the system - Combine final TLB flush and lazy TLB mm shootdown IPIs when using the Radix MMU to avoid a broadcast TLBIE flush on exit - Drop the exclusion between ptrace/perf watchpoints, and drop the now unused associated arch hooks - Add support for the "nohlt" command line option to disable CPU idle - Add support for -fpatchable-function-entry for ftrace, with GCC >= 13.1 - Rework memory block size determination, and support 256MB size on systems with GPUs that have hotpluggable memory - Various other small features and fixes Thanks to Andrew Donnellan, Aneesh Kumar K.V, Arnd Bergmann, Athira Rajeev, Benjamin Gray, Christophe Leroy, Frederic Barrat, Gautam Menghani, Geoff Levand, Hari Bathini, Immad Mir, Jialin Zhang, Joel Stanley, Jordan Niethe, Justin Stitt, Kajol Jain, Kees Cook, Krzysztof Kozlowski, Laurent Dufour, Liang He, Linus Walleij, Mahesh Salgaonkar, Masahiro Yamada, Michal Suchanek, Nageswara R Sastry, Nathan Chancellor, Nathan Lynch, Naveen N Rao, Nicholas Piggin, Nick Desaulniers, Omar Sandoval, Randy Dunlap, Reza Arbab, Rob Herring, Russell Currey, Sourabh Jain, Thomas Gleixner, Trevor Woerner, Uwe Kleine-König, Vaibhav Jain, Xiongfeng Wang, Yuan Tan, Zhang Rui, and Zheng Zengkai. * tag 'powerpc-6.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (135 commits) macintosh/ams: linux/platform_device.h is needed powerpc/xmon: Reapply "Relax frame size for clang" powerpc/mm/book3s64: Use 256M as the upper limit with coherent device memory attached powerpc/mm/book3s64: Fix build error with SPARSEMEM disabled powerpc/iommu: Fix notifiers being shared by PCI and VIO buses powerpc/mpc5xxx: Add missing fwnode_handle_put() powerpc/config: Disable SLAB_DEBUG_ON in skiroot powerpc/pseries: Remove unused hcall tracing instruction powerpc/pseries: Fix hcall tracepoints with JUMP_LABEL=n powerpc: dts: add missing space before { powerpc/eeh: Use pci_dev_id() to simplify the code powerpc/64s: Move CPU -mtune options into Kconfig powerpc/powermac: Fix unused function warning powerpc/pseries: Rework lppaca_shared_proc() to avoid DEBUG_PREEMPT powerpc: Don't include lppaca.h in paca.h powerpc/pseries: Move hcall_vphn() prototype into vphn.h powerpc/pseries: Move VPHN constants into vphn.h cxl: Drop unused detach_spa() powerpc: Drop zalloc_maybe_bootmem() powerpc/powernv: Use struct opal_prd_msg in more places ... |
||
Linus Torvalds
|
df57721f9a |
Add x86 shadow stack support
Convert IBT selftest to asm to fix objtool warning -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmTv1QQACgkQaDWVMHDJ krAUwhAAn6TOwHJK8BSkHeiQhON1nrlP3c5cv0AyZ2NP8RYDrZrSZvhpYBJ6wgKC Cx5CGq5nn9twYsYS3KsktLKDfR3lRdsQ7K9qtyFtYiaeaVKo+7gEKl/K+klwai8/ gninQWHk0zmSCja8Vi77q52WOMkQKapT8+vaON9EVDO8dVEi+CvhAIfPwMafuiwO Rk4X86SzoZu9FP79LcCg9XyGC/XbM2OG9eNUTSCKT40qTTKm5y4gix687NvAlaHR ko5MTsdl0Wfp6Qk0ohT74LnoA2c1g/FluvZIM33ci/2rFpkf9Hw7ip3lUXqn6CPx rKiZ+pVRc0xikVWkraMfIGMJfUd2rhelp8OyoozD7DB7UZw40Q4RW4N5tgq9Fhe9 MQs3p1v9N8xHdRKl365UcOczUxNAmv4u0nV5gY/4FMC6VjldCl2V9fmqYXyzFS4/ Ogg4FSd7c2JyGFKPs+5uXyi+RY2qOX4+nzHOoKD7SY616IYqtgKoz5usxETLwZ6s VtJOmJL0h//z0A7tBliB0zd+SQ5UQQBDC2XouQH2fNX2isJMn0UDmWJGjaHgK6Hh 8jVp6LNqf+CEQS387UxckOyj7fu438hDky1Ggaw4YqowEOhQeqLVO4++x+HITrbp AupXfbJw9h9cMN63Yc0gVxXQ9IMZ+M7UxLtZ3Cd8/PVztNy/clA= =3UUm -----END PGP SIGNATURE----- Merge tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ... |
||
Linus Torvalds
|
cd99b9eb4b |
Documentation work keeps chugging along; stuff for 6.6 includes:
- Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there. - Move the loongarch and mips architecture documentation under Documentation/arch/. - Some more maintainer documentation from Jakub ...plus the usual assortment of updates, translations, and fixes. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmTvqNkPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YgIgH/3drfLtlFtzLqDOzrzDXS8yGnE3pPdxw796b /ZFzAK16wYKaKevYoIz8bVGGKaE1sEUW0mhlq4KGdfZuxLG8YnWS8URyCW4FDU2E 6qNL+8oJ8LZfID46f9Q8ZgfEz7yF/mhCqPk7MEswYtwbscs2ZTGCTGYB/5BHlBuT LR+M89uLmHgr8S1o24v30OgiX+VvQFyu0xoxIhbiqUZvBd/XdfX2pgYd9BGzMj5q C2ZP+V14g36c5pV0EO9TwhCXOF/WVrp7DbjbfWAsqBSLxvpXPydH2q1DUzGeQtP1 exujrBD1O8q3pPdaNA5R+h6cWlHmUZug9mE4BRLp9ErGrozwJsQ= =C3Uv -----END PGP SIGNATURE----- Merge tag 'docs-6.6' of git://git.lwn.net/linux Pull documentation updates from Jonathan Corbet: "Documentation work keeps chugging along; this includes: - Work from Carlos Bilbao to integrate rustdoc output into the generated HTML documentation. This took some work to figure out how to do it without slowing the docs build and without creating people who don't have Rust installed, but Carlos got there - Move the loongarch and mips architecture documentation under Documentation/arch/ - Some more maintainer documentation from Jakub ... plus the usual assortment of updates, translations, and fixes" * tag 'docs-6.6' of git://git.lwn.net/linux: (56 commits) Docu: genericirq.rst: fix irq-example input: docs: pxrc: remove reference to phoenix-sim Documentation: serial-console: Fix literal block marker docs/mm: remove references to hmm_mirror ops and clean typos docs/zh_CN: correct regi_chg(),regi_add() to region_chg(),region_add() Documentation: Fix typos Documentation/ABI: Fix typos scripts: kernel-doc: fix macro handling in enums scripts: kernel-doc: parse DEFINE_DMA_UNMAP_[ADDR|LEN] Documentation: riscv: Update boot image header since EFI stub is supported Documentation: riscv: Add early boot document Documentation: arm: Add bootargs to the table of added DT parameters docs: kernel-parameters: Refer to the correct bitmap function doc: update params of memhp_default_state= docs: Add book to process/kernel-docs.rst docs: sparse: fix invalid link addresses docs: vfs: clean up after the iterate() removal docs: Add a section on surveys to the researcher guidelines docs: move mips under arch docs: move loongarch under arch ... |