For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3 ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-ass.net
Rename the original retbleed return thunk and untrain_ret to
retbleed_return_thunk() and retbleed_untrain_ret().
No functional changes.
Suggested-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.909378169@infradead.org
Use the existing configurable return thunk. There is absolute no
justification for having created this __x86_return_thunk alternative.
To clarify, the whole thing looks like:
Zen3/4 does:
srso_alias_untrain_ret:
nop2
lfence
jmp srso_alias_return_thunk
int3
srso_alias_safe_ret: // aliasses srso_alias_untrain_ret just so
add $8, %rsp
ret
int3
srso_alias_return_thunk:
call srso_alias_safe_ret
ud2
While Zen1/2 does:
srso_untrain_ret:
movabs $foo, %rax
lfence
call srso_safe_ret (jmp srso_return_thunk ?)
int3
srso_safe_ret: // embedded in movabs instruction
add $8,%rsp
ret
int3
srso_return_thunk:
call srso_safe_ret
ud2
While retbleed does:
zen_untrain_ret:
test $0xcc, %bl
lfence
jmp zen_return_thunk
int3
zen_return_thunk: // embedded in the test instruction
ret
int3
Where Zen1/2 flush the BTB entry using the instruction decoder trick
(test,movabs) Zen3/4 use BTB aliasing. SRSO adds a return sequence
(srso_safe_ret()) which forces the function return instruction to
speculate into a trap (UD2). This RET will then mispredict and
execution will continue at the return site read from the top of the
stack.
Pick one of three options at boot (evey function can only ever return
once).
[ bp: Fixup commit message uarch details and add them in a comment in
the code too. Add a comment about the srso_select_mitigation()
dependency on retbleed_select_mitigation(). Add moar ifdeffery for
32-bit builds. Add a dummy srso_untrain_ret_alias() definition for
32-bit alternatives needing the symbol. ]
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.842775684@infradead.org
Objtool --rethunk does two things:
- it collects all (tail) call's of __x86_return_thunk and places them
into .return_sites. These are typically compiler generated, but
RET also emits this same.
- it fudges the validation of the __x86_return_thunk symbol; because
this symbol is inside another instruction, it can't actually find
the instruction pointed to by the symbol offset and gets upset.
Because these two things pertained to the same symbol, there was no
pressing need to separate these two separate things.
However, alas, along comes SRSO and more crazy things to deal with
appeared.
The SRSO patch itself added the following symbol names to identify as
rethunk:
'srso_untrain_ret', 'srso_safe_ret' and '__ret'
Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
new similarly embedded return thunk, and 'srso_untrain_ret' is
completely unrelated to anything the above does (and was only included
because of that INT3 vs UD2 issue fixed previous).
Clear things up by adding a second category for the embedded instruction
thing.
Fixes: fb3bd914b3 ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org
The linker script arch/x86/kernel/vmlinux.lds.S matches the thunk
sections ".text.__x86.*" from arch/x86/lib/retpoline.S as follows:
.text {
[...]
TEXT_TEXT
[...]
__indirect_thunk_start = .;
*(.text.__x86.*)
__indirect_thunk_end = .;
[...]
}
Macro TEXT_TEXT references TEXT_MAIN which normally expands to only
".text". However, with CONFIG_LTO_CLANG, TEXT_MAIN becomes
".text .text.[0-9a-zA-Z_]*" which wrongly matches also the thunk
sections. The output layout is then different than expected. For
instance, the currently defined range [__indirect_thunk_start,
__indirect_thunk_end] becomes empty.
Prevent the problem by using ".." as the first separator, for example,
".text..__x86.indirect_thunk". This pattern is utilized by other
explicit section names which start with one of the standard prefixes,
such as ".text" or ".data", and that need to be individually selected in
the linker script.
[ nathan: Fix conflicts with SRSO and fold in fix issue brought up by
Andrew Cooper in post-review:
https://lore.kernel.org/20230803230323.1478869-1-andrew.cooper3@citrix.com ]
Fixes: dc5723b02e ("kbuild: add support for Clang LTO")
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230711091952.27944-2-petr.pavlu@suse.com
Add a mitigation for the speculative return address stack overflow
vulnerability found on AMD processors.
The mitigation works by ensuring all RET instructions speculate to
a controlled location, similar to how speculation is controlled in the
retpoline sequence. To accomplish this, the __x86_return_thunk forces
the CPU to mispredict every function return using a 'safe return'
sequence.
To ensure the safety of this mitigation, the kernel must ensure that the
safe return sequence is itself free from attacker interference. In Zen3
and Zen4, this is accomplished by creating a BTB alias between the
untraining function srso_untrain_ret_alias() and the safe return
function srso_safe_ret_alias() which results in evicting a potentially
poisoned BTB entry and using that safe one for all function returns.
In older Zen1 and Zen2, this is accomplished using a reinterpretation
technique similar to Retbleed one: srso_untrain_ret() and
srso_safe_ret().
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Function elf_open_read() only zero initializes the initial part of
allocated struct elf; num_relocs member was recently added outside the
zeroed part so that it was left uninitialized, resulting in build failures
on some systems.
The partial initialization is a relic of times when struct elf had large
hash tables embedded. This is no longer the case so remove the trap and
initialize the whole structure instead.
Fixes: eb0481bbc4 ("objtool: Fix reloc_hash size")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@kernel.org>
Link: https://lore.kernel.org/r/20230629102051.42E8360467@lion.mk-sys.cz
The objtool merge in commit 6f612579be ("Merge tag 'objtool-core ...")
generated a semantic conflict that was not resolved.
The btrfs_assertfail() entry was removed from the noreturn list in
commit b831306b3b ("btrfs: print assertion failure report and stack
trace from the same line") because btrfs_assertfail() was changed from a
noreturn function into a macro.
The noreturn list was then moved from check.c to noreturns.h in commit
6245ce4ab6 ("objtool: Move noreturn function list to separate file"),
and should be removed from that post-merge as well.
Do it explicitly.
Cc: David Sterba <dsterba@suse.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Build footprint & performance improvements:
- Reduce memory usage with CONFIG_DEBUG_INFO=y
In the worst case of an allyesconfig+CONFIG_DEBUG_INFO=y kernel, DWARF
creates almost 200 million relocations, ballooning objtool's peak heap
usage to 53GB. These patches reduce that to 25GB.
On a distro-type kernel with kernel IBT enabled, they reduce objtool's
peak heap usage from 4.2GB to 2.8GB.
These changes also improve the runtime significantly.
- Debuggability improvements:
- Add the unwind_debug command-line option, for more extend unwinding
debugging output.
- Limit unreachable warnings to once per function
- Add verbose option for disassembling affected functions
- Include backtrace in verbose mode
- Detect missing __noreturn annotations
- Ignore exc_double_fault() __noreturn warnings
- Remove superfluous global_noreturns entries
- Move noreturn function list to separate file
- Add __kunit_abort() to noreturns
- Unwinder improvements:
- Allow stack operations in UNWIND_HINT_UNDEFINED regions
- drm/vmwgfx: Add unwind hints around RBP clobber
- Cleanups:
- Move the x86 entry thunk restore code into thunk functions
- x86/unwind/orc: Use swap() instead of open coding it
- Remove unnecessary/unused variables
- Fixes for modern stack canary handling
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSaxcoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ht5w//f8mBoABct29pS4ib6pDwRZQDoG8fCA7M
+KWjFD1AhX7RsJVEbM4uBUXdSWZD61xxIa8p8LO2jjzE5RyhM+EuNaisKujKqmfj
uQTSnRhIRHMPqqVGK/gQxy1v4+3+12O32XFIJhAPYCp/dpbZJ2yKDsiHjapzZTDy
BM+86hbIyHFmSl5uJcBFHEv6EGhoxwdrrrOxhpao1CqfAUi+uVgamHGwVqx+NtTY
MvOmcy3/0ukHwDLON0MIMu9MSwvnXorD7+RSkYstwAM/k6ao/k78iJ31sOcynpRn
ri0gmfygJsh2bxL4JUlY4ZeTs7PLWkj3i60deePc5u6EyV4JDJ2borUibs5oGoF6
pN0AwbtubLHHhUI/v74B3E6K6ZGvLiEn9dsNTuXsJffD+qU2REb+WLhr4ut+E1Wi
IKWrYh811yBLyOqFEW3XudZTiXSJlgi3eYiCxspEsKw2RIFFt2g6vYcwrIb0Hatw
8R4/jCWk1nc6Wa3RQYsVnhkglAECSKQdDfS7p2e1hNUTjZuess4EEJjSLs8upIQ9
D1bmuUxEzRxVwAZtXYNh0NKe7OtyOrqgsVTQuqxvWXq2CpC7Hqj8piVJWHdBWgHO
0o2OQqjwSrzAtevpAIaYQv9zhPs1hV7CpBgzzqWGXrwJ3vM6YoSRLf0bg+5OkN8I
O4U2xq2OVa8=
=uNnc
-----END PGP SIGNATURE-----
Merge tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molar:
"Build footprint & performance improvements:
- Reduce memory usage with CONFIG_DEBUG_INFO=y
In the worst case of an allyesconfig+CONFIG_DEBUG_INFO=y kernel,
DWARF creates almost 200 million relocations, ballooning objtool's
peak heap usage to 53GB. These patches reduce that to 25GB.
On a distro-type kernel with kernel IBT enabled, they reduce
objtool's peak heap usage from 4.2GB to 2.8GB.
These changes also improve the runtime significantly.
Debuggability improvements:
- Add the unwind_debug command-line option, for more extend unwinding
debugging output
- Limit unreachable warnings to once per function
- Add verbose option for disassembling affected functions
- Include backtrace in verbose mode
- Detect missing __noreturn annotations
- Ignore exc_double_fault() __noreturn warnings
- Remove superfluous global_noreturns entries
- Move noreturn function list to separate file
- Add __kunit_abort() to noreturns
Unwinder improvements:
- Allow stack operations in UNWIND_HINT_UNDEFINED regions
- drm/vmwgfx: Add unwind hints around RBP clobber
Cleanups:
- Move the x86 entry thunk restore code into thunk functions
- x86/unwind/orc: Use swap() instead of open coding it
- Remove unnecessary/unused variables
Fixes for modern stack canary handling"
* tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
x86/orc: Make the is_callthunk() definition depend on CONFIG_BPF_JIT=y
objtool: Skip reading DWARF section data
objtool: Free insns when done
objtool: Get rid of reloc->rel[a]
objtool: Shrink elf hash nodes
objtool: Shrink reloc->sym_reloc_entry
objtool: Get rid of reloc->jump_table_start
objtool: Get rid of reloc->addend
objtool: Get rid of reloc->type
objtool: Get rid of reloc->offset
objtool: Get rid of reloc->idx
objtool: Get rid of reloc->list
objtool: Allocate relocs in advance for new rela sections
objtool: Add for_each_reloc()
objtool: Don't free memory in elf_close()
objtool: Keep GElf_Rel[a] structs synced
objtool: Add elf_create_section_pair()
objtool: Add mark_sec_changed()
objtool: Fix reloc_hash size
objtool: Consolidate rel/rela handling
...
This KUnit update for Linux 6.5-rc1 consists of:
- kunit_add_action() API to defer a call until test exit.
- Update document to add kunit_add_action() usage notes.
- Changes to always run cleanup from a test kthread.
- Documentation updates to clarify cleanup usage
- assertions should not be used in cleanup
- Documentation update to clearly indicate that exit
functions should run even if init fails
- Several fixes and enhancements to existing tests.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmSYWVcACgkQCwJExA0N
QxwbxA//eGx3xkFN9CWb8ryBTZhs8DZrzc+JlqWEDpk7GQTSlErd3DtInzY0jM2a
GWKV4BJCX6uI2JiyG+cof7nWtnv//L4LxRCpYlY/n7sJeYwZyd1s745nM8lfYTh9
UtAHPmZplAqMCOHgfeUQ6wMxiUc7VGC8Spu82nFzRuSLzf+q5BpK7LPHSJiJ4ea+
kkM+5ygHzBW2cfvULIglb8jQPgPRoVR4RhmmHMF7CYTZQkrU/z7ZZlFTx7LowrxC
p2zWVuH0KJONn4L8rB4QI8oqCZejU2qV2bealCnKY3/atSLUvrnYxyPQbbxCNqmi
EY1XyQFbGsvmgy77IeEXKWhiUmAfD7/Hcvh8M/vLk2wHzQG8+428DAQ7sGRHHqZX
6DvDUo8Z2TE7585glxkbiXhuGsY0y8dkeNURw4URys+TvucNHGrmDfKp0UIEAJW1
iqopMGmM/MDfV5gPUlUEg6jKhTkZOn6OlVwZ8moUaAeAKV7qGGuMrNSZJ6Jw1Gc9
LjI2ma3uZ3hOahyqwU+zwO4CeTJHOq6JjXJZt9aiGwqJPrbjvVCUtikz4QSptU2z
vCjVEV/e7tTGXl+suDb48cu/pyh+z3t5/Gz7eOHMId7S3MENTauxyBXDm1WzoV0c
HuBEsmWXetYuXXkh66LJ/8fzUeWvaGrQPM9hXi2fn1hmPLxOnxw=
=rYT5
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- kunit_add_action() API to defer a call until test exit
- Update document to add kunit_add_action() usage notes
- Changes to always run cleanup from a test kthread
- Documentation updates to clarify cleanup usage (assertions should not
be used in cleanup)
- Documentation update to clearly indicate that exit functions should
run even if init fails
- Several fixes and enhancements to existing tests
* tag 'linux-kselftest-kunit-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
MAINTAINERS: Add source tree entry for kunit
Documentation: kunit: Rename references to kunit_abort()
kunit: Move kunit_abort() call out of kunit_do_failed_assertion()
kunit: Fix obsolete name in documentation headers (func->action)
Documentation: Kunit: add MODULE_LICENSE to sample code
kunit: Update kunit_print_ok_not_ok function
kunit: Fix reporting of the skipped parameterized tests
kunit/test: Add example test showing parameterized testing
Documentation: kunit: Add usage notes for kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
kunit: executor_test: Use kunit_add_action()
kunit: Add kunit_add_action() to defer a call until test exit
kunit: example: Provide example exit functions
Documentation: kunit: Warn that exit functions run even if init fails
Documentation: kunit: Note that assertions should not be used in cleanup
kunit: Always run cleanup from a test kthread
Documentation: kunit: Modular tests should not depend on KUNIT=y
kunit: tool: undo type subscripts for subprocess.Popen
of the ERMS CPUID flag. AMD decoupled them with a BIOS setting so decouple
that dependency in the kernel code too
- Teach the alternatives machinery to handle relocations
- Make debug_alternative accept flags in order to see only that set of
patching done one is interested in
- Other fixes, cleanups and optimizations to the patching code
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSZi2AACgkQEsHwGGHe
VUqhGw/9EC/m5HTFBlCy9PS5Qy6pPLzmHR5Tuy4meqlnB1gN+5wzfxdYEwHm46hH
SR6WqR12yVaCMIzh66y8nTJyMbIykaBbfFJb3WesdDrBIYUZ9f+7O+Xd0JS6Jykd
2HBHOyaVS1/W75+y6w9JhTExBH5xieCpJVIYyAvifbn/pB8XmuTTwJ1Z3EJ8DzkK
AN16i46bUiKNBdTYZUMhtKL4vHVfqLYMskgWe6IG7DmRLOwikR0uRVhuVqP/bmUj
U128cUacGJT2AYbZarTAKmOa42nDj3TpJqRp1qit3y6Cun4vxKH+1A91UPd7IHTa
M5H1bNSgfXMm8rU+JgfvXKqrCTckGn2OqlCkJfPV3RBeP9IcQBBF0vE3dnM/X2We
dwbXeDfJvc+1s4/M41MOhyahTUbW+4iRK5UCZEt1mprTbtzHTlN7RROo7QLpFsWx
T0Jqvsd1raAutPTgTjU7ToQwDpSQNnn4Y/KoEdpvOCXR8wU7Wo5/+Qa4tEkIY3W6
mUFpJcgFC9QEKLuaNAofPIhMuZ/vzRVtpK7wbLn4KR5JZA8AxznenMFVg8YPWRFI
4oga0kMFJ7t6z/CXHtrxFaLQ9e7WAUSRU6gPiz8As1F/K9N0JWMUfjuTJcgjUsF8
bwdCNinwG8y3rrPUCrqbO5N766ZkLYd6NksKlmIyUvtCcS0ksbg=
=mH38
-----END PGP SIGNATURE-----
Merge tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 instruction alternatives updates from Borislav Petkov:
- Up until now the Fast Short Rep Mov optimizations implied the
presence of the ERMS CPUID flag. AMD decoupled them with a BIOS
setting so decouple that dependency in the kernel code too
- Teach the alternatives machinery to handle relocations
- Make debug_alternative accept flags in order to see only that set of
patching done one is interested in
- Other fixes, cleanups and optimizations to the patching code
* tag 'x86_alternatives_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternative: PAUSE is not a NOP
x86/alternatives: Add cond_resched() to text_poke_bp_batch()
x86/nospec: Shorten RESET_CALL_DEPTH
x86/alternatives: Add longer 64-bit NOPs
x86/alternatives: Fix section mismatch warnings
x86/alternative: Optimize returns patching
x86/alternative: Complicate optimize_nops() some more
x86/alternative: Rewrite optimize_nops() some
x86/lib/memmove: Decouple ERMS from FSRM
x86/alternative: Support relocations in alternatives
x86/alternative: Make debug-alternative selective
Assertions reports are split into two parts, the exact file and location
of the condition and then the stack trace printed from
btrfs_assertfail(). This means all the stack traces report the same line
and this is what's typically reported by various tools, making it harder
to distinguish the reports.
[403.2467] assertion failed: refcount_read(&block_group->refs) == 1, in fs/btrfs/block-group.c:4259
[403.2479] ------------[ cut here ]------------
[403.2484] kernel BUG at fs/btrfs/messages.c:259!
[403.2488] invalid opcode: 0000 [#1] PREEMPT SMP KASAN
[403.2493] CPU: 2 PID: 23202 Comm: umount Not tainted 6.2.0-rc4-default+ #67
[403.2499] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[403.2509] RIP: 0010:btrfs_assertfail+0x19/0x1b [btrfs]
...
[403.2595] Call Trace:
[403.2598] <TASK>
[403.2601] btrfs_free_block_groups.cold+0x52/0xae [btrfs]
[403.2608] close_ctree+0x6c2/0x761 [btrfs]
[403.2613] ? __wait_for_common+0x2b8/0x360
[403.2618] ? btrfs_cleanup_one_transaction.cold+0x7a/0x7a [btrfs]
[403.2626] ? mark_held_locks+0x6b/0x90
[403.2630] ? lockdep_hardirqs_on_prepare+0x13d/0x200
[403.2636] ? __call_rcu_common.constprop.0+0x1ea/0x3d0
[403.2642] ? trace_hardirqs_on+0x2d/0x110
[403.2646] ? __call_rcu_common.constprop.0+0x1ea/0x3d0
[403.2652] generic_shutdown_super+0xb0/0x1c0
[403.2657] kill_anon_super+0x1e/0x40
[403.2662] btrfs_kill_super+0x25/0x30 [btrfs]
[403.2668] deactivate_locked_super+0x4c/0xc0
By making btrfs_assertfail a macro we'll get the same line number for
the BUG output:
[63.5736] assertion failed: 0, in fs/btrfs/super.c:1572
[63.5758] ------------[ cut here ]------------
[63.5782] kernel BUG at fs/btrfs/super.c:1572!
[63.5807] invalid opcode: 0000 [#2] PREEMPT SMP KASAN
[63.5831] CPU: 0 PID: 859 Comm: mount Tainted: G D 6.3.0-rc7-default+ #2062
[63.5868] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[63.5905] RIP: 0010:btrfs_mount+0x24/0x30 [btrfs]
[63.5964] RSP: 0018:ffff88800e69fcd8 EFLAGS: 00010246
[63.5982] RAX: 000000000000002d RBX: ffff888008fc1400 RCX: 0000000000000000
[63.6004] RDX: 0000000000000000 RSI: ffffffffb90fd868 RDI: ffffffffbcc3ff20
[63.6026] RBP: ffffffffc081b200 R08: 0000000000000001 R09: ffff88800e69fa27
[63.6046] R10: ffffed1001cd3f44 R11: 0000000000000001 R12: ffff888005a3c370
[63.6062] R13: ffffffffc058e830 R14: 0000000000000000 R15: 00000000ffffffff
[63.6081] FS: 00007f7b3561f800(0000) GS:ffff88806c600000(0000) knlGS:0000000000000000
[63.6105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63.6120] CR2: 00007fff83726e10 CR3: 0000000002a9e000 CR4: 00000000000006b0
[63.6137] Call Trace:
[63.6143] <TASK>
[63.6148] legacy_get_tree+0x80/0xd0
[63.6158] vfs_get_tree+0x43/0x120
[63.6166] do_new_mount+0x1f3/0x3d0
[63.6176] ? do_add_mount+0x140/0x140
[63.6187] ? cap_capable+0xa4/0xe0
[63.6197] path_mount+0x223/0xc10
This comes at a cost of bloating the final btrfs.ko module due all the
inlining, as long as assertions are compiled in. This is a must for
debugging builds but this is often enabled on release builds too.
Release build:
text data bss dec hex filename
1251676 20317 16088 1288081 13a791 pre/btrfs.ko
1260612 29473 16088 1306173 13ee3d post/btrfs.ko
DELTA: +8936
CC: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: David Sterba <dsterba@suse.com>
Objtool doesn't use DWARF at all, and the DWARF sections' data take up a
lot of memory. Skip reading them.
Note this only skips the DWARF base sections, not the rela sections.
The relas are needed because their symbol references may need to be
reindexed if any local symbols get added by elf_create_symbol().
Also note the DWARF data will eventually be read by libelf anyway, when
writing the object file. But that's fine, the goal here is to reduce
*peak* memory usage, and the previous patch (which freed insn memory)
gave some breathing room. So the allocation gets shifted to a later
time, resulting in lower peak memory usage.
With allyesconfig + CONFIG_DEBUG_INFO:
- Before: peak heap memory consumption: 29.93G
- After: peak heap memory consumption: 25.47G
Link: https://lore.kernel.org/r/52a9698835861dd35f2ec35c49f96d0bb39fb177.1685464332.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
When creating an annotation section, allocate the reloc section data at
the beginning. This simplifies the data model a bit and also saves
memory due to the removal of malloc() in elf_rebuild_reloc_section().
With allyesconfig + CONFIG_DEBUG_INFO:
- Before: peak heap memory consumption: 53.49G
- After: peak heap memory consumption: 49.02G
Link: https://lore.kernel.org/r/048e908f3ede9b66c15e44672b6dda992b1dae3e.1685464332.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
The GElf_Rel[a] structs have more similarities than differences. It's
safe to hard-code the assumptions about their shared fields as they will
never change. Consolidate their handling where possible, getting rid of
duplicated code.
Also, at least for now we only ever create rela sections, so simplify
the relocation creation code to be rela-only.
Link: https://lore.kernel.org/r/dcabf6df400ca500ea929f1e4284f5e5ec0b27c8.1685464332.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
- The term "reloc" is overloaded to mean both "an instance of struct
reloc" and "a reloc section". Change the latter to "rsec".
- For variable names, use "sec" for regular sections and "rsec" for rela
sections to prevent them getting mixed up.
- For struct reloc variables, use "reloc" instead of "rel" everywhere
for consistency.
Link: https://lore.kernel.org/r/8b790e403df46f445c21003e7893b8f53b99a6f3.1685464332.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
There are several places where warnings variables are not needed,
remove them and directly return 0.
Signed-off-by: Lu Hongfei <luhongfei@vivo.com>
Link: https://lore.kernel.org/r/20230530075649.21661-1-luhongfei@vivo.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
This is a hack, but it works for now.
Problem is, exc_double_fault() may or may not return, depending on
whether CONFIG_X86_ESPFIX64 is set. But objtool has no visibility to
the kernel config.
"Fix" it by silencing the exc_double_fault() __noreturn warning.
This removes the following warning:
vmlinux.o: warning: objtool: xenpv_exc_double_fault+0xd: exc_double_fault() is missing a __noreturn annotation
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/a45b085071d3a7d049a20f9e78754452336ecbe8.1681853186.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Unreachable instruction warnings are limited to once per object file.
That no longer makes sense for vmlinux validation, which might have
more unreachable instructions lurking in other places. Change it to
once per function.
Note this affects some other (much rarer) non-fatal warnings as well.
In general I think one-warning-per-function makes sense, as related
warnings can accumulate quickly and we want to eventually get back to
failing the build with -Werror anyway.
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lore.kernel.org/r/9d38f881bfc34e031c74e4e90064ccb3e49f599a.1681853186.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
A little while ago someone (Kirill) ran into the whole 'alternatives don't
do relocations nonsense' again and I got annoyed enough to actually look
at the code.
Since the whole alternative machinery already fully decodes the
instructions it is simple enough to adjust immediates and displacement
when needed. Specifically, the immediates for IP modifying instructions
(JMP, CALL, Jcc) and the displacement for RIP-relative instructions.
[ bp: Massage comment some more and get rid of third loop in
apply_relocation(). ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.313857925@infradead.org
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
this inconsistently follow this new, common convention, and fix all the fallout
that objtool can now detect statically.
- Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
- Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
- Generate ORC data for __pfx code
- Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
- Misc improvements & fixes.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
/PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
fRho4CqzrtM=
=NwEV
-----END PGP SIGNATURE-----
Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool updates from Ingo Molnar:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures &
drivers that did this inconsistently follow this new, common
convention, and fix all the fallout that objtool can now detect
statically
- Fix/improve the ORC unwinder becoming unreliable due to
UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
and UNWIND_HINT_UNDEFINED to resolve it
- Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code
- Generate ORC data for __pfx code
- Add more __noreturn annotations to various kernel startup/shutdown
and panic functions
- Misc improvements & fixes
* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
x86/hyperv: Mark hv_ghcb_terminate() as noreturn
scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
btrfs: Mark btrfs_assertfail() __noreturn
objtool: Include weak functions in global_noreturns check
cpu: Mark nmi_panic_self_stop() __noreturn
cpu: Mark panic_smp_self_stop() __noreturn
arm64/cpu: Mark cpu_park_loop() and friends __noreturn
x86/head: Mark *_start_kernel() __noreturn
init: Mark start_kernel() __noreturn
init: Mark [arch_call_]rest_init() __noreturn
objtool: Generate ORC data for __pfx code
x86/linkage: Fix padding for typed functions
objtool: Separate prefix code from stack validation code
objtool: Remove superfluous dead_end_function() check
objtool: Add symbol iteration helpers
objtool: Add WARN_INSN()
scripts/objdump-func: Support multiple functions
context_tracking: Fix KCSAN noinstr violation
objtool: Add stackleak instrumentation to uaccess safe list
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmRHC3gACgkQxWXV+ddt
WDvI/A//ZzREEE0wNexbuidoTacDVXVJ6LBb2K1eP+HUKfsmd6GYWQDJ9x/ExpKb
T1ehLibCYWLeYxEREFbjXI3x9G8mrvLzvzsqXs/MzJPkmEF1igPddFztidBwvLQH
ey/Bh+cra2bpVhRhkX0Cf09/q/YWp17/d14ZxxW60PMfyhx8RWXejXhHkulOPVv8
+3FL8E0kc2Zjx9ioUwOy/i18LR6YzsCNVXoHzUZuWyWM4A7NG2TZR6FhuLSjlWSZ
3RAnROwr+8i5nR0xchcyYaVMO2LMbqH6mBtHnXCtxCr+4pFrfrvKym+CQco/Xriz
v1y/xDc23XeYXLCVhb0beJ6uRcjaM9+gvDF1oVBSJEv6V7sQr/tEGo/8QRehfEfT
FTro7Lf89R1GOa1IBSkv/T5S25d9LlIID3/g7PbcUBtXNKvLAjDAGTH9bzL4HS5x
/MKwN80GvaGs1KyEfUndbVPIpAwNFDYZPHM7nw1x+JTkIBcHgfjRyAMAC9jrJd0D
730W04c+0nXZtQGtKKsxc3U8y4ewzSJAKx9t7Vgo7+1P6dSRnzvJee3x/5kXV9Yn
MhxxzYDfIN9EcWbASdSm11gY5WZdG3an609pO7nc1T2K4Tuo0SPs4xOR7c3xuZrY
MN5z3QFWyI2ustUuTG+nsd5J81j76DEmj5ymWQfG3SBplTneDM0=
=Jt7p
-----END PGP SIGNATURE-----
Merge tag 'for-6.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Mostly core changes and cleanups, some notable fixes and two
performance improvements in directory logging.
The IO path cleanups are removing or refactoring old code, scrub main
loop has been completely rewritten also refactoring old code.
There are some changes to non-btrfs code, mostly trivial, the cgroup
punt bio logic is only moved from generic code.
Performance improvements:
- improve logging changes in a directory during one transaction,
avoid iterating over items and reduce lock contention (fsync time
4x lower)
- when logging directory entries during one transaction, reduce
locking of subvolume trees by checking tree-log instead
(improvement in throughput and latency for concurrent access to a
subvolume)
Notable fixes:
- dev-replace:
- properly honor read mode when requested to avoid reading from
source device
- target device won't be used for eventual read repair, this is
unreliable for NODATASUM files
- when there are unpaired (and unrepairable) metadata during
replace, exit early with error and don't try to finish whole
operation
- scrub ioctl properly rejects unknown flags
- fix global block reserve calculations
- fix partial direct io write when there's a page fault in the
middle, iomap will try to continue with partial request but the
btrfs part did not match that, this can lead to zeros written
instead of data
Core changes:
- io path:
- continued cleanups and refactoring around bio handling
- extent io submit path simplifications and cleanups
- flush write path simplifications and cleanups
- rework logic of passing sync mode of bio, with further cleanups
- rewrite scrub code flow, restructure how the stripes are enumerated
and verified in a more unified way
- allow to set lower threshold for block group reclaim in debug mode
to aid zoned mode testing
- remove obsolete time-based delayed ref throttling logic when
truncating items
- DREW locks are not using percpu variables anymore
- more warning fixes (-Wmaybe-uninitialized)
- u64 division simplifications
- error handling improvements
Non-btrfs code changes:
- push cgroup punt bio logic to btrfs code (there was no other user
of that), the functionality can be now selected separately by
BLK_CGROUP_PUNT_BIO
- crc32c_impl removed after removing last uses in btrfs code
- add btrfs_assertfail() to objtool table"
* tag 'for-6.4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits)
btrfs: mark btrfs_assertfail() __noreturn
btrfs: fix uninitialized variable warnings
btrfs: use log root when iterating over index keys when logging directory
btrfs: avoid iterating over all indexes when logging directory
btrfs: dev-replace: error out if we have unrepaired metadata error during
btrfs: remove pointless loop at btrfs_get_next_valid_item()
btrfs: scrub: reject unsupported scrub flags
btrfs: reinterpret async discard iops_limit=0 as no delay
btrfs: set default discard iops_limit to 1000
btrfs: remove unused raid56 functions which were dedicated for scrub
btrfs: scrub: remove scrub_bio structure
btrfs: scrub: remove scrub_block and scrub_sector structures
btrfs: scrub: remove the old scrub recheck code
btrfs: scrub: remove the old writeback infrastructure
btrfs: scrub: remove scrub_parity structure
btrfs: scrub: use scrub_stripe to implement RAID56 P/Q scrub
btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure
btrfs: scrub: introduce helper to queue a stripe for scrub
btrfs: scrub: introduce error reporting functionality for scrub_stripe
btrfs: scrub: introduce a writeback helper for scrub_stripe
...
still a fair amount going on, including:
- Reorganizing the architecture-specific documentation under
Documentation/arch. This makes the structure match the source directory
and helps to clean up the mess that is the top-level Documentation
directory a bit. This work creates the new directory and moves x86 and
most of the less-active architectures there. The current plan is to move
the rest of the architectures in 6.5, with the patches going through the
appropriate subsystem trees.
- Some more Spanish translations and maintenance of the Italian
translation.
- A new "Kernel contribution maturity model" document from Ted.
- A new tutorial on quickly building a trimmed kernel from Thorsten.
Plus the usual set of updates and fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmRGze0PHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5Y/VsH/RyWqinorRVFZmHqRJMRhR0j7hE2pAgK5prE
dGXYVtHHNQ+25thNaqhZTOLYFbSX6ii2NG7sLRXmyOTGIZrhUCFFXCHkuq4ZUypR
gJpMUiKQVT4dhln3gIZ0k09NSr60gz8UTcq895N9UFpUdY1SCDhbCcLc4uXTRajq
NrdgFaHWRkPb+gBRbXOExYm75DmCC6Ny5AyGo2rXfItV//ETjWIJVQpJhlxKrpMZ
3LgpdYSLhEFFnFGnXJ+EAPJ7gXDi2Tg5DuPbkvJyFOTouF3j4h8lSS9l+refMljN
xNRessv+boge/JAQidS6u8F2m2ESSqSxisv/0irgtKIMJwXaoX4=
=1//8
-----END PGP SIGNATURE-----
Merge tag 'docs-6.4' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Commit volume in documentation is relatively low this time, but there
is still a fair amount going on, including:
- Reorganize the architecture-specific documentation under
Documentation/arch
This makes the structure match the source directory and helps to
clean up the mess that is the top-level Documentation directory a
bit. This work creates the new directory and moves x86 and most of
the less-active architectures there.
The current plan is to move the rest of the architectures in 6.5,
with the patches going through the appropriate subsystem trees.
- Some more Spanish translations and maintenance of the Italian
translation
- A new "Kernel contribution maturity model" document from Ted
- A new tutorial on quickly building a trimmed kernel from Thorsten
Plus the usual set of updates and fixes"
* tag 'docs-6.4' of git://git.lwn.net/linux: (47 commits)
media: Adjust column width for pdfdocs
media: Fix building pdfdocs
docs: clk: add documentation to log which clocks have been disabled
docs: trace: Fix typo in ftrace.rst
Documentation/process: always CC responsible lists
docs: kmemleak: adjust to config renaming
ELF: document some de-facto PT_* ABI quirks
Documentation: arm: remove stih415/stih416 related entries
docs: turn off "smart quotes" in the HTML build
Documentation: firmware: Clarify firmware path usage
docs/mm: Physical Memory: Fix grammar
Documentation: Add document for false sharing
dma-api-howto: typo fix
docs: move m68k architecture documentation under Documentation/arch/
docs: move parisc documentation under Documentation/arch/
docs: move ia64 architecture docs under Documentation/arch/
docs: Move arc architecture docs under Documentation/arch/
docs: move nios2 documentation under Documentation/arch/
docs: move openrisc documentation under Documentation/arch/
docs: move superh documentation under Documentation/arch/
...
The old 'copy_user_generic_unrolled' function was oddly implemented for
largely historical reasons: it had been largely based on the uncached
copy case, which has some other concerns.
For example, the __copy_user_nocache() function uses 'movnti' for the
destination stores, and those want the destination to be aligned. In
contrast, the regular copy function doesn't really care, and trying to
align things only complicates matters.
Also, like the clear_user function, the copy function had some odd
handling of the repeat counts, complicating the exception handling for
no really good reason. So as with clear_user, just write it to keep all
the byte counts in the %rcx register, exactly like the 'rep movs'
functionality that this replaces.
Unlike a real 'rep movs', we do allow for this to trash a few temporary
registers to not have to unnecessarily save/restore registers on the
stack.
And like the clearing case, rename this to what it now clearly is:
'rep_movs_alternative', and make it one coherent function, so that it
shows up as such in profiles (instead of the odd split between
"copy_user_generic_unrolled" and "copy_user_short_string", the latter of
which was not about strings at all, and which was shared with the
uncached case).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The old version was oddly written to have the repeat count in multiple
registers. So instead of taking advantage of %rax being zero, it had
some sub-counts in it. All just for a "single word clearing" loop,
which isn't even efficient to begin with.
So get rid of those games, and just keep all the state in the same
registers we got it in (and that we should return things in). That not
only makes this act much more like 'rep stos' (which this function is
replacing), but makes it much easier to actually do the obvious loop
unrolling.
Also rename the function from the now nonsensical 'clear_user_original'
to what it now clearly is: 'rep_stos_alternative'.
End result: if we don't have a fast 'rep stosb', at least we can have a
fast fallback for it.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>