Since Warn count is now tracked and is a fairly interesting signal, add
the entry /sys/kernel/warn_count to expose it to userspace.
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-6-keescook@chromium.org
Like oops_limit, add warn_limit for limiting the number of warnings when
panic_on_warn is not set.
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-5-keescook@chromium.org
Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll
their own warnings, and each check "panic_on_warn". Consolidate this
into a single function so that future instrumentation can be added in
a single location.
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Gow <davidgow@google.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: Jann Horn <jannh@google.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: kasan-dev@googlegroups.com
Cc: linux-mm@kvack.org
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org
Since Oops count is now tracked and is a fairly interesting signal, add
the entry /sys/kernel/oops_count to expose it to userspace.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Jann Horn <jannh@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-3-keescook@chromium.org
Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.
The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)
So let's panic the system if the kernel is constantly oopsing.
The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)
It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.
12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org
In preparation for adding more sysctls directly in kernel/panic.c, split
CONFIG_SMP from the logic that adds sysctls.
Cc: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-1-keescook@chromium.org
The actual size of the following arrays at run-time depends on
CONFIG_X86_PAE.
427 pmd_t *u_pmds[MAX_PREALLOCATED_USER_PMDS];
428 pmd_t *pmds[MAX_PREALLOCATED_PMDS];
If CONFIG_X86_PAE is not enabled, their final size will be zero (which
is technically not a legal storage size in C, but remains "valid" via
the GNU extension). In that case, the compiler complains about trying to
access objects of size zero when calling functions where these objects
are passed as arguments.
Fix this by sanity-checking the size of those arrays just before the
function calls. Also, the following warnings are fixed by these changes
when building with GCC 11+ and -Wstringop-overflow enabled:
arch/x86/mm/pgtable.c:437:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:440:13: warning: ‘preallocate_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:462:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:455:9: warning: ‘pgd_prepopulate_user_pmd’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
arch/x86/mm/pgtable.c:464:9: warning: ‘free_pmds.constprop’ accessing 8 bytes in a region of size 0 [-Wstringop-overflow=]
This is one of the last cases in the ongoing effort to globally enable
-Wstringop-overflow.
The alternative to this is to make the originally suggested change:
make the pmds argument from an array pointer to a pointer pointer. That
situation is considered "legal" for C in the sense that it does not have
a way to reason about the storage. i.e.:
-static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
+static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t **pmds)
With the above change, there's no difference in binary output, and the
compiler warning is silenced.
However, with this patch, the compiler can actually figure out that it
isn't using the code at all, and it gets dropped:
text data bss dec hex filename
8218 718 32 8968 2308 arch/x86/mm/pgtable.o.before
7765 694 32 8491 212b arch/x86/mm/pgtable.o.after
So this case (fixing a warning and reducing image size) is a clear win.
Additionally drops an old work-around for GCC in the same code.
Link: https://github.com/KSPP/linux/issues/203
Link: https://github.com/KSPP/linux/issues/181
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/Yytb67xvrnctxnEe@work
With all "silently resizing" callers of ksize() refactored, remove the
logic in ksize() that would allow it to be used to effectively change
the size of an allocation (bypassing __alloc_size hints, etc). Users
wanting this feature need to either use kmalloc_size_roundup() before an
allocation, or use krealloc() directly.
For kfree_sensitive(), move the unpoisoning logic inline. Replace the
some of the partially open-coded ksize() in __do_krealloc with ksize()
now that it doesn't perform unpoisoning.
Adjust the KUnit tests to match the new ksize() behavior. Execution
tested with:
$ ./tools/testing/kunit/kunit.py run \
--kconfig_add CONFIG_KASAN=y \
--kconfig_add CONFIG_KASAN_GENERIC=y \
--arch x86_64 kasan
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: linux-mm@kvack.org
Cc: kasan-dev@googlegroups.com
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Enhanced-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/gpu/drm/sti/sti_hda.c:637:16: error: incompatible function pointer types initializing 'enum drm_mode_status (*)(struct drm_connector *, struct drm_display_mode *)' with an expression of type 'int (struct drm_connector *, struct drm_display_mode *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.mode_valid = sti_hda_connector_mode_valid,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/sti/sti_dvo.c:376:16: error: incompatible function pointer types initializing 'enum drm_mode_status (*)(struct drm_connector *, struct drm_display_mode *)' with an expression of type 'int (struct drm_connector *, struct drm_display_mode *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.mode_valid = sti_dvo_connector_mode_valid,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/sti/sti_hdmi.c:1035:16: error: incompatible function pointer types initializing 'enum drm_mode_status (*)(struct drm_connector *, struct drm_display_mode *)' with an expression of type 'int (struct drm_connector *, struct drm_display_mode *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.mode_valid = sti_hdmi_connector_mode_valid,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
->mode_valid() in 'struct drm_connector_helper_funcs' expects a return
type of 'enum drm_mode_status', not 'int'. Adjust the return type of
sti_{dvo,hda,hdmi}_connector_mode_valid() to match the prototype's to
resolve the warning and CFI failure.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102155623.3042869-1-nathan@kernel.org
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed. A
proposed warning in clang aims to catch these at compile time, which
reveals:
drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_rgb.c:74:16: error: incompatible function pointer types initializing 'enum drm_mode_status (*)(struct drm_connector *, struct drm_display_mode *)' with an expression of type 'int (struct drm_connector *, struct drm_display_mode *)' [-Werror,-Wincompatible-function-pointer-types-strict]
.mode_valid = fsl_dcu_drm_connector_mode_valid,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
->mode_valid() in 'struct drm_connector_helper_funcs' expects a return
type of 'enum drm_mode_status', not 'int'. Adjust the return type of
fsl_dcu_drm_connector_mode_valid() to match the prototype's to resolve
the warning and CFI failure.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reported-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221102154215.78059-1-nathan@kernel.org
Mark the devm_*alloc()-family of allocations with appropriate
__alloc_size()/__realloc_size() hints so the compiler can attempt to
reason about buffer lengths from allocations.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Nishanth Menon <nm@ti.com>
Cc: Michael Kelley <mikelley@microsoft.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Won Chung <wonchung@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20221029074734.gonna.276-kees@kernel.org
Commit d4c6399900 ("vmlinux.lds.h: Avoid orphan section with !SMP")
fixed an orphan section warning by adding the '.data..decrypted' section
to the linker script under the PERCPU_DECRYPTED_SECTION define but that
placement introduced a panic with !SMP, as the percpu sections are not
instantiated with that configuration so attempting to access variables
defined with DEFINE_PER_CPU_DECRYPTED() will result in a page fault.
Move the '.data..decrypted' section to the DATA_MAIN define so that the
variables in it are properly instantiated at boot time with
CONFIG_SMP=n.
Cc: stable@vger.kernel.org
Fixes: d4c6399900 ("vmlinux.lds.h: Avoid orphan section with !SMP")
Link: https://lore.kernel.org/cbbd3548-880c-d2ca-1b67-5bb93b291d5f@huawei.com/
Debugged-by: Ard Biesheuvel <ardb@kernel.org>
Reported-by: Zhao Wenhui <zhaowenhui8@huawei.com>
Tested-by: xiafukun <xiafukun@huawei.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221108174934.3384275-1-nathan@kernel.org
Implement a robust overflows_type() macro to test if a variable or
constant value would overflow another variable or type. This can be
used as a constant expression for static_assert() (which requires a
constant expression[1][2]) when used on constant values. This must be
constructed manually, since __builtin_add_overflow() does not produce
a constant expression[3].
Additionally adds castable_to_type(), similar to __same_type(), but for
checking if a constant value would overflow if cast to a given type.
Add unit tests for overflows_type(), __same_type(), and castable_to_type()
to the existing KUnit "overflow" test:
[16:03:33] ================== overflow (21 subtests) ==================
...
[16:03:33] [PASSED] overflows_type_test
[16:03:33] [PASSED] same_type_test
[16:03:33] [PASSED] castable_to_type_test
[16:03:33] ==================== [PASSED] overflow =====================
[16:03:33] ============================================================
[16:03:33] Testing complete. Ran 21 tests: passed: 21
[16:03:33] Elapsed time: 24.022s total, 0.002s configuring, 22.598s building, 0.767s running
[1] https://en.cppreference.com/w/c/language/_Static_assert
[2] C11 standard (ISO/IEC 9899:2011): 6.7.10 Static assertions
[3] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
6.56 Built-in Functions to Perform Arithmetic with Overflow Checking
Built-in Function: bool __builtin_add_overflow (type1 a, type2 b,
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Vitor Massaru Iha <vitor@massaru.org>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: linux-hardening@vger.kernel.org
Cc: llvm@lists.linux.dev
Co-developed-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221024201125.1416422-1-gwan-gyeong.mun@intel.com
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Andrew Cooper suggested upgrading the orphan section warning to a hard link
error. However Nathan Chancellor said outright turning the warning into an
error with no escape hatch might be too aggressive, as we have had these
warnings triggered by new compiler generated sections, and suggested turning
orphan sections into an error only if CONFIG_WERROR is set. Kees Cook echoed
and emphasized that the mandate from Linus is that we should avoid breaking
builds. It wrecks bisection, it causes problems across compiler versions, etc.
Thus upgrade the orphan section warning to a hard link error only if
CONFIG_WERROR is set.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Xin Li <xin3.li@intel.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221025073023.16137-2-xin3.li@intel.com
A common exploit pattern for ROP attacks is to abuse prepare_kernel_cred()
in order to construct escalated privileges[1]. Instead of providing a
short-hand argument (NULL) to the "daemon" argument to indicate using
init_cred as the base cred, require that "daemon" is always set to
an actual task. Replace all existing callers that were passing NULL
with &init_task.
Future attacks will need to have sufficiently powerful read/write
primitives to have found an appropriately privileged task and written it
to the ROP stack as an argument to succeed, which is similarly difficult
to the prior effort needed to escalate privileges before struct cred
existed: locate the current cred and overwrite the uid member.
This has the added benefit of meaning that prepare_kernel_cred() can no
longer exceed the privileges of the init task, which may have changed from
the original init_cred (e.g. dropping capabilities from the bounding set).
[1] https://google.com/search?q=commit_creds(prepare_kernel_cred(0))
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Steve French <sfrench@samba.org>
Cc: Ronnie Sahlberg <lsahlber@redhat.com>
Cc: Shyam Prasad N <sprasad@microsoft.com>
Cc: Tom Talpey <tom@talpey.com>
Cc: Namjae Jeon <linkinjeon@kernel.org>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Russ Weight <russell.h.weight@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Link: https://lore.kernel.org/r/20221026232943.never.775-kees@kernel.org
Replacing compile-time safe calls of strcpy()-related functions with
strscpy() was always calling the full strscpy() logic when a builtin
would be better. For example:
char buf[16];
strcpy(buf, "yes");
would reduce to __builtin_memcpy(buf, "yes", 4), but not if it was:
strscpy(buf, yes, sizeof(buf));
Fix this by checking if all sizes are known at compile-time.
Cc: linux-hardening@vger.kernel.org
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Add __realloc_size() hint to kmemdup() so the compiler can reason about
the length of the returned buffer. (These must not use __alloc_size,
since those include __malloc which says the contents aren't defined[1]).
[1] https://lore.kernel.org/linux-hardening/d199c2af-06af-8a50-a6a1-00eefa0b67b4@prevas.dk/
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Andy Shevchenko <andriy.shevchenko@intel.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
The "side effects" memmove() test accidentally found[1] a corner case in
the recent refactoring of the i386 assembly memmove(), but missed another
corner case. Instead of hoping to get lucky next time, implement much
more complete tests of memcpy() and memmove() -- especially the moving
window overlap for memmove() -- which catches all the issues encountered
and should catch anything new.
[1] https://lore.kernel.org/lkml/CAKwvOdkaKTa2aiA90VzFrChNQM6O_ro+b7VWs=op70jx-DKaXA@mail.gmail.com
Cc: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
While there were varying degrees of kern-doc for various str*()-family
functions, many needed updating and clarification, or to just be
entirely written. Update (and relocate) existing kern-doc and add missing
functions, sadly shaking my head at how many times I have written "Do
not use this function". Include the results in the core kernel API doc.
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-hardening@vger.kernel.org
Tested-by: Akira Yokosawa <akiyks@gmail.com>
Link: https://lore.kernel.org/lkml/9b0cf584-01b3-3013-b800-1ef59fe82476@gmail.com
Signed-off-by: Kees Cook <keescook@chromium.org>
In two recent run-time memcpy() bound checking bug reports (NFS[1] and
JFS[2]), the _detection_ was working correctly (in the sense that the
requested copy size was larger than the destination field size), but
the _warning text_ was showing the destination field size as SIZE_MAX
("unknown size"). This should be impossible, since the detection function
will explicitly give up if the destination field size is unknown. For
example, the JFS warning was:
memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 18446744073709551615)
Other cases of this warning (e.g.[3]) have reported correctly,
and the reproducer only happens under GCC (at least 10.2 and 12.1),
so this currently appears to be a GCC bug. Explicitly capturing the
__builtin_object_size() results in const temporary variables fixes the
report. For example, the JFS reproducer now correctly reports the field
size (128):
memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 128)
Examination of the .text delta (which is otherwise identical), shows
the literal value used in the report changing:
- mov $0xffffffffffffffff,%rcx
+ mov $0x80,%ecx
[1] https://lore.kernel.org/lkml/Y0zEzZwhOxTDcBTB@codemonkey.org.uk/
[2] https://syzkaller.appspot.com/bug?id=23d613df5259b977dac1696bec77f61a85890e3d
[3] https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com/
Cc: "Dr. David Alan Gilbert" <linux@treblig.org>
Cc: llvm@lists.linux.dev
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Convert test exclusion into test skipping. This brings the logic for
why a test is being skipped into the test itself, instead of having to
spread ifdefs around the code. This will make cleanup easier as minimum
tests get raised. Drop __maybe_unused so missed tests will be noticed
again and clean up whitespace.
For example, clang-11 on i386:
[15:52:32] ================== overflow (18 subtests) ==================
[15:52:32] [PASSED] u8_u8__u8_overflow_test
[15:52:32] [PASSED] s8_s8__s8_overflow_test
[15:52:32] [PASSED] u16_u16__u16_overflow_test
[15:52:32] [PASSED] s16_s16__s16_overflow_test
[15:52:32] [PASSED] u32_u32__u32_overflow_test
[15:52:32] [PASSED] s32_s32__s32_overflow_test
[15:52:32] [SKIPPED] u64_u64__u64_overflow_test
[15:52:32] [SKIPPED] s64_s64__s64_overflow_test
[15:52:32] [SKIPPED] u32_u32__int_overflow_test
[15:52:32] [PASSED] u32_u32__u8_overflow_test
[15:52:32] [PASSED] u8_u8__int_overflow_test
[15:52:32] [PASSED] int_int__u8_overflow_test
[15:52:32] [PASSED] shift_sane_test
[15:52:32] [PASSED] shift_overflow_test
[15:52:32] [PASSED] shift_truncate_test
[15:52:32] [PASSED] shift_nonsense_test
[15:52:32] [PASSED] overflow_allocation_test
[15:52:32] [PASSED] overflow_size_helpers_test
[15:52:32] ==================== [PASSED] overflow =====================
[15:52:32] ============================================================
[15:52:32] Testing complete. Ran 18 tests: passed: 15, skipped: 3
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Tom Rix <trix@redhat.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: llvm@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://lore.kernel.org/r/20221006230017.1833458-1-keescook@chromium.org
Building the overflow kunit tests with clang-11 fails with:
$ ./tools/testing/kunit/kunit.py run --arch=arm --make_options LLVM=1 \
overflow
...
ld.lld: error: undefined symbol: __mulodi4
...
Clang 11 and earlier generate unwanted libcalls for signed output,
unsigned input.
Disable these tests for now, but should these become used in the kernel
we might consider that as justification for dropping clang-11 support.
Keep the clang-11 build alive a little bit longer.
Avoid -Wunused-function warnings via __maybe_unused. To test W=1:
$ make LLVM=1 -j128 defconfig
$ ./scripts/config -e KUNIT -e KUNIT_ALL
$ make LLVM=1 -j128 olddefconfig lib/overflow_kunit.o W=1
Link: https://github.com/ClangBuiltLinux/linux/issues/1711
Link: 3203143f13
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221006171751.3444575-1-ndesaulniers@google.com
Fix the kern-doc markings for several of the overflow helpers and move
their location into the core kernel API documentation, where it belongs
(it's not driver-specific).
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmNHYD0ACgkQSfxwEqXe
A655AA//dJK0PdRghqrKQsl18GOCffV5TUw5i1VbJQbI9d8anfxNjVUQiNGZi4et
qUwZ8OqVXxYx1Z1UDgUE39PjEDSG9/cCvOpMUWqN20/+6955WlNZjwA7Fk6zjvlM
R30fz5CIJns9RFvGT4SwKqbVLXIMvfg/wDENUN+8sxt36+VD2gGol7J2JJdngEhM
lW+zqzi0ABqYy5so4TU2kixpKmpC08rqFvQbD1GPid+50+JsOiIqftDErt9Eg1Mg
MqYivoFCvbAlxxxRh3+UHBd7ZpJLtp1UFEOl2Rf00OXO+ZclLCAQAsTczucIWK9M
8LCZjb7d4lPJv9RpXFAl3R1xvfc+Uy2ga5KeXvufZtc5G3aMUKPuIU7k28ZyblVS
XXsXEYhjTSd0tgi3d0JlValrIreSuj0z2QGT5pVcC9utuAqAqRIlosiPmgPlzXjr
Us4jXaUhOIPKI+Musv/fqrxsTQziT0jgVA3Njlt4cuAGm/EeUbLUkMWwKXjZLTsv
vDsBhEQFmyZqxWu4pYo534VX2mQWTaKRV1SUVVhQEHm57b00EAiZohoOvweB09SR
4KiJapikoopmW4oAUFotUXUL1PM6yi+MXguTuc1SEYuLz/tCFtK8DJVwNpfnWZpE
lZKvXyJnHq2Sgod/hEZq58PMvT6aNzTzSg7YzZy+VabxQGOO5mc=
=M+mV
-----END PGP SIGNATURE-----
Merge tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull more random number generator updates from Jason Donenfeld:
"This time with some large scale treewide cleanups.
The intent of this pull is to clean up the way callers fetch random
integers. The current rules for doing this right are:
- If you want a secure or an insecure random u64, use get_random_u64()
- If you want a secure or an insecure random u32, use get_random_u32()
The old function prandom_u32() has been deprecated for a while
now and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16()
- If you want a secure or an insecure random u8, use get_random_u8()
- If you want secure or insecure random bytes, use get_random_bytes().
The old function prandom_bytes() has been deprecated for a while
now and has long been a wrapper around get_random_bytes()
- If you want a non-uniform random u32, u16, or u8 bounded by a
certain open interval maximum, use prandom_u32_max()
I say "non-uniform", because it doesn't do any rejection sampling
or divisions. Hence, it stays within the prandom_*() namespace, not
the get_random_*() namespace.
I'm currently investigating a "uniform" function for 6.2. We'll see
what comes of that.
By applying these rules uniformly, we get several benefits:
- By using prandom_u32_max() with an upper-bound that the compiler
can prove at compile-time is ≤65536 or ≤256, internally
get_random_u16() or get_random_u8() is used, which wastes fewer
batched random bytes, and hence has higher throughput.
- By using prandom_u32_max() instead of %, when the upper-bound is
not a constant, division is still avoided, because
prandom_u32_max() uses a faster multiplication-based trick instead.
- By using get_random_u16() or get_random_u8() in cases where the
return value is intended to indeed be a u16 or a u8, we waste fewer
batched random bytes, and hence have higher throughput.
This series was originally done by hand while I was on an airplane
without Internet. Later, Kees and I worked on retroactively figuring
out what could be done with Coccinelle and what had to be done
manually, and then we split things up based on that.
So while this touches a lot of files, the actual amount of code that's
hand fiddled is comfortably small"
* tag 'random-6.1-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random:
prandom: remove unused functions
treewide: use get_random_bytes() when possible
treewide: use get_random_u32() when possible
treewide: use get_random_{u8,u16}() when possible, part 2
treewide: use get_random_{u8,u16}() when possible, part 1
treewide: use prandom_u32_max() when possible, part 2
treewide: use prandom_u32_max() when possible, part 1
- Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
when using bperf (perf BPF based counters) with cgroups.
- Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
monitors bandwidth, latency, bus utilization and buffer occupancy.
Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
- User space tasks can migrate between CPUs, so when tracing selected
CPUs, system-wide sideband is still needed, fix it in the setup of
Intel PT on hybrid systems.
- Fix metricgroups title message in 'perf list', it should state that
the metrics groups are to be used with the '-M' option, not '-e'.
- Sync the msr-index.h copy with the kernel sources, adding support
for using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as
well as decoding it when printing the MSR tracepoint arguments.
- Fix program header size and alignment when generating a JIT ELF
in 'perf inject'.
- Add multiple new Intel PT 'perf test' entries, including a jitdump one.
- Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
running on PowerPC due to an invalid topology number in that arch.
- Fix the 'perf test' for arm_coresight failures on the ARM Juno system.
- Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this option
to the or expression expected in the intercepted perf_event_open() syscall.
- Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the 'perf
annotate' asm parser.
- Fix 'perf mem record -C' option processing, it was being chopped up
when preparing the underlying 'perf record -e mem-events' and thus being
ignored, requiring using '-- -C CPUs' as a workaround.
- Improvements and tidy ups for 'perf test' shell infra.
- Fix Intel PT information printing segfault in uClibc, where a NULL
format was being passed to fprintf.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY0vyXAAKCRCyPKLppCJ+
J3EgAQDgr9FhTCTG+u46iGqPG4lxc46ZWKB3MgZwPuX6P2jwLwD9GCwGow4qHQVP
F/m7S/3tK/ShPfPWB2m4nVHd9xp7uwM=
=F1IB
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull more perf tools updates from Arnaldo Carvalho de Melo:
- Use BPF CO-RE (Compile Once, Run Everywhere) to support old kernels
when using bperf (perf BPF based counters) with cgroups.
- Support HiSilicon PCIe Performance Monitoring Unit (PMU), that
monitors bandwidth, latency, bus utilization and buffer occupancy.
Documented in Documentation/admin-guide/perf/hisi-pcie-pmu.rst.
- User space tasks can migrate between CPUs, so when tracing selected
CPUs, system-wide sideband is still needed, fix it in the setup of
Intel PT on hybrid systems.
- Fix metricgroups title message in 'perf list', it should state that
the metrics groups are to be used with the '-M' option, not '-e'.
- Sync the msr-index.h copy with the kernel sources, adding support for
using "AMD64_TSC_RATIO" in filter expressions in 'perf trace' as well
as decoding it when printing the MSR tracepoint arguments.
- Fix program header size and alignment when generating a JIT ELF in
'perf inject'.
- Add multiple new Intel PT 'perf test' entries, including a jitdump
one.
- Fix the 'perf test' entries for 'perf stat' CSV and JSON output when
running on PowerPC due to an invalid topology number in that arch.
- Fix the 'perf test' for arm_coresight failures on the ARM Juno
system.
- Fix the 'perf test' attr entry for PERF_FORMAT_LOST, adding this
option to the or expression expected in the intercepted
perf_event_open() syscall.
- Add missing condition flags ('hs', 'lo', 'vc', 'vs') for arm64 in the
'perf annotate' asm parser.
- Fix 'perf mem record -C' option processing, it was being chopped up
when preparing the underlying 'perf record -e mem-events' and thus
being ignored, requiring using '-- -C CPUs' as a workaround.
- Improvements and tidy ups for 'perf test' shell infra.
- Fix Intel PT information printing segfault in uClibc, where a NULL
format was being passed to fprintf.
* tag 'perf-tools-for-v6.1-2-2022-10-16' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (23 commits)
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet
perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver
perf auxtrace arm: Refactor event list iteration in auxtrace_record__init()
perf tests stat+json_output: Include sanity check for topology
perf tests stat+csv_output: Include sanity check for topology
perf intel-pt: Fix system_wide dummy event for hybrid
perf intel-pt: Fix segfault in intel_pt_print_info() with uClibc
perf test: Fix attr tests for PERF_FORMAT_LOST
perf test: test_intel_pt.sh: Add 9 tests
perf inject: Fix GEN_ELF_TEXT_OFFSET for jit
perf test: test_intel_pt.sh: Add jitdump test
perf test: test_intel_pt.sh: Tidy some alignment
perf test: test_intel_pt.sh: Print a message when skipping kernel tracing
perf test: test_intel_pt.sh: Tidy some perf record options
perf test: test_intel_pt.sh: Fix return checking again
perf: Skip and warn on unknown format 'configN' attrs
perf list: Fix metricgroups title message
perf mem: Fix -C option behavior for perf mem record
perf annotate: Add missing condition flags for arm64
...
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for
the combination of Clang >= 14 and GAS <= 2.35.
- Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
the package size.
- Fix modpost error under build environments using musl.
- Make *.ll files keep value names for easier debugging
- Fix single directory build
- Prevent RISC-V from selecting the broken DWARF5 support when Clang
and GAS are used together.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmNMSCcVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGuVsP/j9FBN3x9S14gAHpu4BAFLK0s31W
A5sGtmEb1keLqW4oY7/5bcr8KgIrY1extJBeSOJHLB1z/cfU7CHd7bl3+oadZH+z
BNQ7F9SAHm9GuZoM58TMmC5/Eq0a45bqEP32wvoscyrFQ0ka11aQw/lOZmVTYSgO
NrTHUSD6NmJCG8hbMiJAH8ch+fziSR0JXOomOwJDxs63aXHhavjZ3z7pgySnuPav
PD46QtKtpjH8H+gx4nJMqDWjaukGlq7+kVIHhZh3oC5KU23UfUc3d3U+Lpati4+w
Ggl1pmR5iMsYioQ/MaC58hb06WkamAYRfxKWXvpzEAVGIHF+xhMdGybK4FOPQkQh
J9Rb358LD1d/QtH6C77wajaEj1FvQLaOQ8CHUDSzjgGwJuz+qrpI8kwtgRxJCXgp
0+2YQxdfWR2kJ9W7lnyguVjM7AYebqS7bCGm2fDPU92NWftw4y2TJii1v10BCD/N
dB3orKHPp3mosAS2SdTXgMYYMlzFMzgma0PzibWvm4DE4tHtndRMvW/8c5UyB8uk
ganuHOUg8Vup79OiANSD6lJrzq0fZofvD3euD61mis6s39GAeHvr5rlwy0xOoN8A
TgOBu2DQFUKrlZH2m4F+hEBzCz26HTkg8+S5DNpb7Qr2EKDlLPT3xjwhQlooipNc
KuZNXoR6wEstepn/
=EZAr
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Fix CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y compile error for the
combination of Clang >= 14 and GAS <= 2.35.
- Drop vmlinux.bz2 from the rpm package as it just annoyingly increased
the package size.
- Fix modpost error under build environments using musl.
- Make *.ll files keep value names for easier debugging
- Fix single directory build
- Prevent RISC-V from selecting the broken DWARF5 support when Clang
and GAS are used together.
* tag 'kbuild-fixes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
lib/Kconfig.debug: Add check for non-constant .{s,u}leb128 support to DWARF5
kbuild: fix single directory build
kbuild: add -fno-discard-value-names to cmd_cc_ll_c
scripts/clang-tools: Convert clang-tidy args to list
modpost: put modpost options before argument
kbuild: Stop including vmlinux.bz2 in the rpm's
Kconfig.debug: add toolchain checks for DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT
Kconfig.debug: simplify the dependency of DEBUG_INFO_DWARF4/5
The clk rate range series needed another week to fully bake. Maxime
fixed the bug that broke clk notifiers and prevented this from being
included in the first pull request. He also added a unit test on top to
make sure it doesn't break so easily again. The majority of the series
fixes up how the clk_set_rate_*() APIs work, particularly around when
the rate constraints are dropped and how they move around when
reparenting clks. Overall it's a much needed improvement to the clk rate
range APIs that used to be pretty broken if you looked sideways.
Beyond the core changes there are a few driver fixes for a compilation
issue or improper data causing clks to fail to register or have the
wrong parents. These are good to get in before the first -rc so that the
system actually boots on the affected devices.
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAmNLfwARHHNib3lkQGtl
cm5lbC5vcmcACgkQrQKIl8bklSW1uhAA0QMgI8Ywv18PfZi2vWGpAO9sB62DfmdU
sbmsXrfEHdJmXmqT66Ydr8area6loeCagK4Vm/dEcgsf2DwI/xCdDra/ARZjLU49
9VjgC332YLZzk6bHXsY2eqCA2TS6nV4ZoVsonkQv2vezYNLNk7FTgByzIcWpYiY8
RuKEVHnp2yWwk5HrX+pELR0dMDCLTB3p+WVHmnQpyYVK+rcfaSNuDxLTSNXb3yEl
tGTUu4eL09FMwyRZ4iGmRXpvzIacbthYmnGmEtnOYDeFV3k3wBwHPbEizstvS0MI
vv89aHdsOnGgTdzPUZtA6UppByajyoDKbYzb3hw8pUPNNykbq6XmaeBTV7U2O9De
ihfeHVlDjN2HCv1HXfDsCaqlD6WM25T+pmKPT45Zj9b+rKkxloVxsOBuLmMzDgNd
fa1X3qfGfzm5jpZ1SVSTLdRSqOT5Q00nzAgQailUiDoumgdTSN5ZDNPHcIv/Crvn
me+pabVldp0tgYvuNWYr46u7ugwnzUMBVJfnQ+xZTl8xqLQ/yRmkkhB/rsS5RDY1
z6NZ66JyHpSwnaev4Ozjyj8GlRdgDUVHGa/4Vm1jJSbUb7THZGSzjCklZXPxkn8I
VqHNJN+luzaf6bKe85yrgUJKOi8NMZZS//MKWDnOdhDqgqtI0pM4hJX+Tvb2Bc4B
2SA7XzHesKg=
=ZG8Y
-----END PGP SIGNATURE-----
Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull more clk updates from Stephen Boyd:
"This is the final part of the clk patches for this merge window.
The clk rate range series needed another week to fully bake. Maxime
fixed the bug that broke clk notifiers and prevented this from being
included in the first pull request. He also added a unit test on top
to make sure it doesn't break so easily again. The majority of the
series fixes up how the clk_set_rate_*() APIs work, particularly
around when the rate constraints are dropped and how they move around
when reparenting clks. Overall it's a much needed improvement to the
clk rate range APIs that used to be pretty broken if you looked
sideways.
Beyond the core changes there are a few driver fixes for a compilation
issue or improper data causing clks to fail to register or have the
wrong parents. These are good to get in before the first -rc so that
the system actually boots on the affected devices"
* tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (31 commits)
clk: tegra: Fix Tegra PWM parent clock
clk: at91: fix the build with binutils 2.27
clk: qcom: gcc-msm8660: Drop hardcoded fixed board clocks
clk: mediatek: clk-mux: Add .determine_rate() callback
clk: tests: Add tests for notifiers
clk: Update req_rate on __clk_recalc_rates()
clk: tests: Add missing test case for ranges
clk: qcom: clk-rcg2: Take clock boundaries into consideration for gfx3d
clk: Introduce the clk_hw_get_rate_range function
clk: Zero the clk_rate_request structure
clk: Stop forwarding clk_rate_requests to the parent
clk: Constify clk_has_parent()
clk: Introduce clk_core_has_parent()
clk: Switch from __clk_determine_rate to clk_core_round_rate_nolock
clk: Add our request boundaries in clk_core_init_rate_req
clk: Introduce clk_hw_init_rate_request()
clk: Move clk_core_init_rate_req() from clk_core_round_rate_nolock() to its caller
clk: Change clk_core_init_rate_req prototype
clk: Set req_rate on reparenting
clk: Take into account uncached clocks in clk_set_rate_range()
...
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmNLLdMACgkQiiy9cAdy
T1EWGwv/SN2nGgQ5QBxcqHmKyYyP/ndo3iig6L55VdBrWjTIH6vMAJfDAZwq5Veb
YNkN6EQhT4XQ8RQ3nzJdSuwSLZu3CF5JMdChyoM3qgLSrT9CU3n2E5n0S/OK7TNI
QvX2RMZrOiwl2sApJXk5xQgmocGwNOMdsI9IwxZ8zDATpLUicwYmcNajDTmPHjX7
W8IVh4hHTBmVpnN5WkKLninLCz23eqcEMQb+nxtD3sMhOAUGn6gkKo+0L1EOYwgS
jFfZxGbLoIBi1IhMcoWEcODnDtpwS9vtGDy2ZPVkefp5LqjjEEqR7LQLAvSt0dYK
MpXHAvTUO8T7EEDx2djUNVM0JfXYBa0w3iOqSkOS2EBZF/Q8/ABWccfNpCTnGoAZ
G3VJvDP/K5iPiRwy/0KK6CeCvCiXgBslcvRVdIvSCsehuvEyYTXRrjRLHElZBVZz
Odwn8C9UGy5w0DTQ6QvwYtYWpoi2UFm8GuoswOfljYwnrul+7P0R14xxARJE8TOh
fBXL0FAu
=8BkB
-----END PGP SIGNATURE-----
Merge tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more cifs updates from Steve French:
- fix a regression in guest mounts to old servers
- improvements to directory leasing (caching directory entries safely
beyond the root directory)
- symlink improvement (reducing roundtrips needed to process symlinks)
- an lseek fix (to problem where some dir entries could be skipped)
- improved ioctl for returning more detailed information on directory
change notifications
- clarify multichannel interface query warning
- cleanup fix (for better aligning buffers using ALIGN and round_up)
- a compounding fix
- fix some uninitialized variable bugs found by Coverity and the kernel
test robot
* tag '6.1-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
smb3: improve SMB3 change notification support
cifs: lease key is uninitialized in two additional functions when smb1
cifs: lease key is uninitialized in smb1 paths
smb3: must initialize two ACL struct fields to zero
cifs: fix double-fault crash during ntlmssp
cifs: fix static checker warning
cifs: use ALIGN() and round_up() macros
cifs: find and use the dentry for cached non-root directories also
cifs: enable caching of directories for which a lease is held
cifs: prevent copying past input buffer boundaries
cifs: fix uninitialised var in smb2_compound_op()
cifs: improve symlink handling for smb2+
smb3: clarify multichannel warning
cifs: fix regression in very old smb1 mounts
cifs: fix skipping to incorrect offset in emit_cached_dirents
This reverts commit 78e5a33994 ("cpumask: fix checking valid cpu range").
syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at
cpu_max_bits_warn() [1], for commit 78e5a33994 ("cpumask: fix checking
valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE()
when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of
"cpu" will trivially hit cpu == nr_cpumask_bits condition.
Although syzbot found this problem in linux-next.git on 2022/09/27 [2],
this problem was not fixed immediately. As a result, that patch was
sent to linux.git before the patch author recognizes this problem, and
syzbot started failing to test changes in linux.git since 2022/10/10
[3].
Andrew Jones proposed a fix for x86 and riscv architectures [4]. But
[2] and [5] indicate that affected locations are not limited to arch
code. More delay before we find and fix affected locations, less tested
kernel (and more difficult to bisect and fix) before release.
We should have inspected and fixed basically all cpumask users before
applying that patch. We should not crash kernels in order to ask
existing cpumask users to update their code, even if limited to
CONFIG_DEBUG_PER_CPU_MAPS=y case.
Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1]
Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2]
Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3]
Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4]
Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5]
Reported-by: Andrew Jones <ajones@ventanamicro.com>
Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building with a RISC-V kernel with DWARF5 debug info using clang
and the GNU assembler, several instances of the following error appear:
/tmp/vgettimeofday-48aa35.s:2963: Error: non-constant .uleb128 is not supported
Dumping the .s file reveals these .uleb128 directives come from
.debug_loc and .debug_ranges:
.Ldebug_loc0:
.byte 4 # DW_LLE_offset_pair
.uleb128 .Lfunc_begin0-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp1-.Lfunc_begin0 # ending offset
.byte 1 # Loc expr size
.byte 90 # DW_OP_reg10
.byte 0 # DW_LLE_end_of_list
.Ldebug_ranges0:
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp6-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp27-.Lfunc_begin0 # ending offset
.byte 4 # DW_RLE_offset_pair
.uleb128 .Ltmp28-.Lfunc_begin0 # starting offset
.uleb128 .Ltmp30-.Lfunc_begin0 # ending offset
.byte 0 # DW_RLE_end_of_list
There is an outstanding binutils issue to support a non-constant operand
to .sleb128 and .uleb128 in GAS for RISC-V but there does not appear to
be any movement on it, due to concerns over how it would work with
linker relaxation.
To avoid these build errors, prevent DWARF5 from being selected when
using clang and an assembler that does not have support for these symbol
deltas, which can be easily checked in Kconfig with as-instr plus the
small test program from the dwz test suite from the binutils issue.
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=27215
Link: https://github.com/ClangBuiltLinux/linux/issues/1719
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Commit f110e5a250 ("kbuild: refactor single builds of *.ko") was wrong.
KBUILD_MODULES _is_ needed for single builds.
Otherwise, "make foo/bar/baz/" does not build module objects at all.
Fixes: f110e5a250 ("kbuild: refactor single builds of *.ko")
Reported-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: David Sterba <dsterba@suse.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmNLD/EACgkQ4CHKc/GJ
qRCnhQf+Oj0qB9bdy+MmgirN0/VHFmTQbNSYUd/gzGmfcAHpxIE9KG0V9+y9I2wG
Nh6WgUKwX1IEKQ37X+VT/XsIe9VcALcn5LjxD/J4cL71CREa/0HGQbBavt9GuDsC
zkUwxYx6iAtGfK/PK9jE2eHIzxzfZ6kEkFsMaS+jP/8iLnE9trAhQ1o6vG15EFPA
MHjJ3+y7AsUE7SYHKL+8WLA+QR443SlHN0u327KkA2kKpjsj+hqQdiPfHqOArBbo
vw2DI14tcELGtruo5zHMVT9TcXWV7hcJ6yTTnaKxI+WCbgsEpPQKevTmc7q9P0H4
hLgQEElRuzBrXUCIBPVboNuTgGNjLQ==
=cVwd
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab hotfix from Vlastimil Babka:
"A single fix for the common-kmalloc series, for warnings on mips and
sparc64 reported by Guenter Roeck"
* tag 'slab-for-6.1-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation
I have relocated to London so not much work from me while I get settled.
Still, OpenRISC picked up two patches in this window:
- Fix for kernel page table walking from Jann Horn
- MAINTAINER entry cleanup from Palmer Dabbelt
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE2cRzVK74bBA6Je/xw7McLV5mJ+QFAmNK478ACgkQw7McLV5m
J+QdSw/+NFTmfpo40EGa3IdrK9ZQCgQoPN7rGKEr+QpRcVZuER+oDax6+hbobmBp
E+0Trdx8px/YJLWVC9imGxqMS8ev1twLy3VrCi+5p7B6mYUkMcDkf5Et1uI5WLky
3BBnw92Ycmc+bR2ROESRp4mvMHDdVhdIt60sHSFjcGTH2FnB6IiwUFX/GrcxYy9U
kkLrQ7AW36SzVIZ/0CNYZwxvlHv5n9aFiii0SKQriCuWDT2EvUw3f4kaOEzTg7pA
UKUU3CU8n4Rg9LeHiVb0KqLlGS+/ZntO9Ad2jgYvF4X42ijLnTgjSUOjmeC1D5R8
N0h1Y9WByL513MymfnptVrZSIhVOFhtsPgP2ys4CXMfHX+0iRKNOt3ZpSHFXpjDe
SlVA2k+BBt8SPwAIMoQBClaGKdgT0vjTE/l389BujhIDgdG18HEIhB77SLnyXvNq
VuXfeRlQuvEkSGH0sMdFr9rBM08Q3fS5S+i+2YJucChLndnZr+iE2KhYiG/tzia3
u64LmYDrPFNTIDjg8huLnHmUxlVcPK9vQSn3VRI1pKn47v4zD3rHa4mFryR6rhNw
xQDFOmAeeOmBiHoatwObGHaItEChLWvBW63JdUh873N2lg5UwcqqKp8e10KT35kG
CZA3g4OSTXM+PDWOJi1tDec08smIFqUv/U53tckcGV4b0myFNk0=
=WKxB
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of https://github.com/openrisc/linux
Pull OpenRISC updates from Stafford Horne:
"I have relocated to London so not much work from me while I get
settled.
Still, OpenRISC picked up two patches in this window:
- Fix for kernel page table walking from Jann Horn
- MAINTAINER entry cleanup from Palmer Dabbelt"
* tag 'for-linus' of https://github.com/openrisc/linux:
MAINTAINERS: git://github -> https://github.com for openrisc
openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmNKzlUUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxBbA//QYuw8iZHzshJntCyWlTRg742x2jI
2NrLojVG/ccSlqnl/KRultaPFdSlZb221UASt7CAEIGoP6+/SNepazEEJ4QJhMX6
inY/hYrVd2OHhCnO+OoNd7DKfbeJRkFVMHOGJdpbQ7jMwa99YSaN6xEMfALcZIrg
8CqYKDpdJW4avYyKlfXayY42d8gljWJwfLuK1xEnnIrdvXQEB3vj3yKxziadhBtA
EJbLWORVF+yQwq63NCPkLvCe0ZuV09a4q4IhlzWo+30FLMOsk2z4HBi1B2fwlawV
tnYCt0i6FNC6e3DJyWX/hkAiIbJUadnych+LcGq+/FY30tCBtNkNaQN3XsrvdvI4
5piacj1av21R7JHSyk64M4jdNlE7MYonQNsc8vp4yDtlAjQYkaj7BJdsIq3MCTei
ce6mxguVb2hrMr4fZXc8ka7XZkofJNv/XhfuQAorzKfEqA8cI4enICsPdoTFyZDV
pYYuHIJXY4rTm0wUD+TlDg/Cn6gyLu3WYVyEn6q/E53Twj1rr5ske7igDMDVYYly
qMg8CTPHYzHJ0wbzkf0CfND7SVAAE1BnIzr4Drfbrm+fi11HVhd0u/0XiWf2NthW
/ZFQUDBh0SwSHgUnO+ONqcR3arhIT4PEsEJadD4kuW4nIyPS7t1Nfu0V3XUZDa14
U6UeL6URnXBDV7U=
=gs3v
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull pci fix from Bjorn Helgaas:
"Revert the attempt to distribute spare resources to unconfigured
hotplug bridges at boot time.
This fixed some dock hot-add scenarios, but Jonathan Cameron reported
that it broke a topology with a multi-function device where one
function was a Switch Upstream Port and the other was an Endpoint"
* tag 'pci-v6.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
Revert "PCI: Distribute available resources for root buses, too"
After commit d6a71648db ("mm/slab: kmalloc: pass requests larger than
order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2)
requests to buddy like SLUB does.
SLAB has been using kmalloc caches to allocate freelist_idx_t array for
off slab caches. But after the commit, freelist_size can be bigger than
KMALLOC_MAX_CACHE_SIZE.
Instead of using pointer to kmalloc cache, use kmalloc_node() and only
check if the kmalloc cache is off slab during calculate_slab_order().
If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens
as it allocates freelist_idx_t array directly from buddy.
Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: d6a71648db ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator")
Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Github deprecated the git:// links about a year ago, so let's move to
the https:// URLs instead.
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://github.blog/2021-09-01-improving-git-protocol-security-github/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Stafford Horne <shorne@gmail.com>
Change notification is a commonly supported feature by most servers,
but the current ioctl to request notification when a directory is
changed does not return the information about what changed
(even though it is returned by the server in the SMB3 change
notify response), it simply returns when there is a change.
This ioctl improves upon CIFS_IOC_NOTIFY by returning the notify
information structure which includes the name of the file(s) that
changed and why. See MS-SMB2 2.2.35 for details on the individual
filter flags and the file_notify_information structure returned.
To use this simply pass in the following (with enough space
to fit at least one file_notify_information structure)
struct __attribute__((__packed__)) smb3_notify {
uint32_t completion_filter;
bool watch_tree;
uint32_t data_len;
uint8_t data[];
} __packed;
using CIFS_IOC_NOTIFY_INFO 0xc009cf0b
or equivalently _IOWR(CIFS_IOCTL_MAGIC, 11, struct smb3_notify_info)
The ioctl will block until the server detects a change to that
directory or its subdirectories (if watch_tree is set).
Acked-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
cifs_open and _cifsFileInfo_put also end up with lease_key uninitialized
in smb1 mounts. It is cleaner to set lease key to zero in these
places where leases are not supported (smb1 can not return lease keys
so the field was uninitialized).
Addresses-Coverity: 1514207 ("Uninitialized scalar variable")
Addresses-Coverity: 1514331 ("Uninitialized scalar variable")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
It is cleaner to set lease key to zero in the places where leases are not
supported (smb1 can not return lease keys so the field was uninitialized).
Addresses-Coverity: 1513994 ("Uninitialized scalar variable")
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
Coverity spotted that we were not initalizing Stbz1 and Stbz2 to
zero in create_sd_buf.
Addresses-Coverity: 1513848 ("Uninitialized scalar variable")
Cc: <stable@vger.kernel.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>