linux

Author	SHA1	Message	Date
Yonghong Song	d7e7b42f4f	bpf: Fix a btf decl_tag bug when tagging a function syzbot reported a btf decl_tag bug with stack trace below: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 PID: 3592 Comm: syz-executor914 Not tainted 5.16.0-syzkaller-11424-gb7892f7d5cb2 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:btf_type_vlen include/linux/btf.h:231 [inline] RIP: 0010:btf_decl_tag_resolve+0x83e/0xaa0 kernel/bpf/btf.c:3910 ... Call Trace: <TASK> btf_resolve+0x251/0x1020 kernel/bpf/btf.c:4198 btf_check_all_types kernel/bpf/btf.c:4239 [inline] btf_parse_type_sec kernel/bpf/btf.c:4280 [inline] btf_parse kernel/bpf/btf.c:4513 [inline] btf_new_fd+0x19fe/0x2370 kernel/bpf/btf.c:6047 bpf_btf_load kernel/bpf/syscall.c:4039 [inline] __sys_bpf+0x1cbb/0x5970 kernel/bpf/syscall.c:4679 __do_sys_bpf kernel/bpf/syscall.c:4738 [inline] __se_sys_bpf kernel/bpf/syscall.c:4736 [inline] __x64_sys_bpf+0x75/0xb0 kernel/bpf/syscall.c:4736 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae The kasan error is triggered with an illegal BTF like below: type 0: void type 1: int type 2: decl_tag to func type 3 type 3: func to func_proto type 8 The total number of types is 4 and the type 3 is illegal since its func_proto type is out of range. Currently, the target type of decl_tag can be struct/union, var or func. Both struct/union and var implemented their own 'resolve' callback functions and hence handled properly in kernel. But func type doesn't have 'resolve' callback function. When btf_decl_tag_resolve() tries to check func type, it tries to get vlen of its func_proto type, which triggered the above kasan error. To fix the issue, btf_decl_tag_resolve() needs to do btf_func_check() before trying to accessing func_proto type. In the current implementation, func type is checked with btf_func_check() in the main checking function btf_check_all_types(). To fix the above kasan issue, let us implement 'resolve' callback func type properly. The 'resolve' callback will be also called in btf_check_all_types() for func types. Fixes: `b5ea834dde` ("bpf: Support for new btf kind BTF_KIND_TAG") Reported-by: syzbot+53619be9444215e785ed@syzkaller.appspotmail.com Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220203191727.741862-1-yhs@fb.com	2022-02-03 13:06:04 -08:00
Mickaël Salaün	1f2cfdd349	printk: Fix incorrect __user type in proc_dointvec_minmax_sysadmin() The move of proc_dointvec_minmax_sysadmin() from kernel/sysctl.c to kernel/printk/sysctl.c introduced an incorrect __user attribute to the buffer argument. I spotted this change in [1] as well as the kernel test robot. Revert this change to please sparse: kernel/printk/sysctl.c:20:51: warning: incorrect type in argument 3 (different address spaces) kernel/printk/sysctl.c:20:51: expected void * kernel/printk/sysctl.c:20:51: got void [noderef] __user *buffer Fixes: `faaa357a55` ("printk: move printk sysctl to printk/sysctl.c") Link: https://lore.kernel.org/r/20220104155024.48023-2-mic@digikod.net [1] Reported-by: kernel test robot <lkp@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com> Link: https://lore.kernel.org/r/20220203145029.272640-1-mic@digikod.net Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-02-03 11:27:38 -08:00
Igor Pylypiv	67d6212afd	Revert "module, async: async_synchronize_full() on module init iff async is used" This reverts commit `774a1221e8`. We need to finish all async code before the module init sequence is done. In the reverted commit the PF_USED_ASYNC flag was added to mark a thread that called async_schedule(). Then the PF_USED_ASYNC flag was used to determine whether or not async_synchronize_full() needs to be invoked. This works when modprobe thread is calling async_schedule(), but it does not work if module dispatches init code to a worker thread which then calls async_schedule(). For example, PCI driver probing is invoked from a worker thread based on a node where device is attached: if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); We end up in a situation where a worker thread gets the PF_USED_ASYNC flag set instead of the modprobe thread. As a result, async_synchronize_full() is not invoked and modprobe completes without waiting for the async code to finish. The issue was discovered while loading the pm80xx driver: (scsi_mod.scan=async) modprobe pm80xx worker ... do_init_module() ... pci_call_probe() work_on_cpu(local_pci_probe) local_pci_probe() pm8001_pci_probe() scsi_scan_host() async_schedule() worker->flags \|= PF_USED_ASYNC; ... < return from worker > ... if (current->flags & PF_USED_ASYNC) <--- false async_synchronize_full(); Commit `21c3c5d280` ("block: don't request module during elevator init") fixed the deadlock issue which the reverted commit `774a1221e8` ("module, async: async_synchronize_full() on module init iff async is used") tried to fix. Since commit `0fdff3ec6d` ("async, kmod: warn on synchronous request_module() from async workers") synchronous module loading from async is not allowed. Given that the original deadlock issue is fixed and it is no longer allowed to call synchronous request_module() from async we can remove PF_USED_ASYNC flag to make module init consistently invoke async_synchronize_full() unless async module probe is requested. Signed-off-by: Igor Pylypiv <ipylypiv@google.com> Reviewed-by: Changyuan Lyu <changyuanl@google.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-02-03 11:20:34 -08:00
Linus Torvalds	305e6c42e8	Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Eric's fix for a long standing cgroup1 permission issue where it only checks for uid 0 instead of CAP which inadvertently allows unprivileged userns roots to modify release_agent userhelper - Fixes for the fallout from Waiman's recent cpuset work * 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning cgroup-v1: Require capabilities to set release_agent cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask() cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy	2022-02-03 08:15:13 -08:00
Waiman Long	2bdfd2825c	cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning It was found that a "suspicious RCU usage" lockdep warning was issued with the rcu_read_lock() call in update_sibling_cpumasks(). It is because the update_cpumasks_hier() function may sleep. So we have to release the RCU lock, call update_cpumasks_hier() and reacquire it afterward. Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks() instead of stating that in the comment. Fixes: `4716909cc5` ("cpuset: Track cpusets that use parent's effective_cpus") Signed-off-by: Waiman Long <longman@redhat.com> Tested-by: Phil Auld <pauld@redhat.com> Reviewed-by: Phil Auld <pauld@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-02-03 05:59:01 -10:00
Hou Tao	b293dcc473	bpf: Use VM_MAP instead of VM_ALLOC for ringbuf After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages after mapping"), non-VM_ALLOC mappings will be marked as accessible in __get_vm_area_node() when KASAN is enabled. But now the flag for ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access after vmap() returns. Because the ringbuf area is created by mapping allocated pages, so use VM_MAP instead. After the change, info in /proc/vmallocinfo also changes from [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmalloc user to [start]-[end] 24576 ringbuf_map_alloc+0x171/0x290 vmap user Fixes: `457f44363a` ("bpf: Implement BPF ring buffer and verifier support for it") Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com	2022-02-02 23:15:24 -08:00
Changbin Du	fe13889c39	genirq, softirq: Use in_hardirq() instead of in_irq() Replace the obsolete and ambiguos macro in_irq() with the new macro in_hardirq(). Signed-off-by: Changbin Du <changbin.du@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220128110727.5110-1-changbin.du@gmail.com	2022-02-02 21:34:19 +01:00
Christoph Hellwig	aa8dcccaf3	block: check that there is a plug in blk_flush_plug Rename blk_flush_plug to __blk_flush_plug and add a wrapper that includes the NULL check instead of open coding that check everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220127070549.1377856-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:50:00 -07:00
Christoph Hellwig	b1f866b013	block: remove blk_needs_flush_plug blk_needs_flush_plug fails to account for the cb_list, which needs flushing as well. Remove it and just check if there is a plug instead of poking into the internals of the plug structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220127070549.1377856-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:50:00 -07:00
Christoph Hellwig	07888c665b	block: pass a block_device and opf to bio_alloc Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Christoph Hellwig	322cbb50de	block: remove genhd.h There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Adrian Hunter	58b2ff2c18	perf/core: Allow kernel address filter when not filtering the kernel The so-called 'kernel' address filter can also be useful for filtering fixed addresses in user space. Allow that. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220131072453.2839535-6-adrian.hunter@intel.com	2022-02-02 13:11:43 +01:00
Adrian Hunter	d680ff24e9	perf/core: Fix address filter parser for multiple filters Reset appropriate variables in the parser loop between parsing separate filters, so that they do not interfere with parsing the next filter. Fixes: `375637bc52` ("perf/core: Introduce address range filtering") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220131072453.2839535-4-adrian.hunter@intel.com	2022-02-02 13:11:42 +01:00
Marco Elver	3c25fc97f5	perf: Copy perf_event_attr::sig_data on modification The intent has always been that perf_event_attr::sig_data should also be modifiable along with PERF_EVENT_IOC_MODIFY_ATTRIBUTES, because it is observable by user space if SIGTRAP on events is requested. Currently only PERF_TYPE_BREAKPOINT is modifiable, and explicitly copies relevant breakpoint-related attributes in hw_breakpoint_copy_attr(). This misses copying perf_event_attr::sig_data. Since sig_data is not specific to PERF_TYPE_BREAKPOINT, introduce a helper to copy generic event-type-independent attributes on modification. Fixes: `97ba62b278` ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Link: https://lore.kernel.org/r/20220131103407.1971678-1-elver@google.com	2022-02-02 13:11:40 +01:00
Zhen Ni	c8eaf6ac76	sched: move autogroup sysctls into its own file move autogroup sysctls to autogroup.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni <nizhen@uniontech.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220128095025.8745-1-nizhen@uniontech.com	2022-02-02 13:11:37 +01:00
Mathieu Desnoyers	bfdf4e6208	rseq: Remove broken uapi field layout on 32-bit little endian The rseq rseq_cs.ptr.{ptr32,padding} uapi endianness handling is entirely wrong on 32-bit little endian: a preprocessor logic mistake wrongly uses the big endian field layout on 32-bit little endian architectures. Fortunately, those ptr32 accessors were never used within the kernel, and only meant as a convenience for user-space. Remove those and replace the whole rseq_cs union by a __u64 type, as this is the only thing really needed to express the ABI. Document how 32-bit architectures are meant to interact with this field. Fixes: `ec9c82e03a` ("rseq: uapi: Declare rseq_cs field as union, update includes") Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220127152720.25898-1-mathieu.desnoyers@efficios.com	2022-02-02 13:11:34 +01:00
Waiman Long	fc153c1c58	clocksource: Add a Kconfig option for WATCHDOG_MAX_SKEW A watchdog maximum skew of 100us may still be too small for some systems or archs. It may also be too small when some kernel debug config options are enabled. So add a new Kconfig option CLOCKSOURCE_WATCHDOG_MAX_SKEW_US to allow kernel builders to have more control on the threshold for marking clocksource as unstable. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:35:43 -08:00
Paul E. McKenney	9c0f1c7fd7	rcutorture: Enable limited callback-flooding tests of SRCU This commit allows up to 50,000 callbacks worth of callback-flooding tests of SRCU. The goal of this change is to exercise Tree SRCU's ability to transition from SRCU_SIZE_SMALL to SRCU_SIZE_BIG triggered by callback-queue-time lock contention. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:39 -08:00
Paul E. McKenney	6b8646a9d3	torture: Wake up kthreads after storing task_struct pointer Currently, _torture_create_kthread() uses kthread_run() to create torture-test kthreads, which means that the resulting task_struct pointer is stored after the newly created kthread has been marked runnable. This in turn can cause spurious failure of checks for code being run by a particular kthread. This commit therefore changes _torture_create_kthread() to use kthread_create(), then to do an explicit wake_up_process() after the task_struct pointer has been stored. Reported-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:39 -08:00
Paul E. McKenney	89440d2dad	rcutorture: Fix rcu_fwd_mutex deadlock The rcu_torture_fwd_cb_hist() function acquires rcu_fwd_mutex, but is invoked from rcutorture_oom_notify() function, which hold this same mutex across this call. This commit fixes the resulting deadlock. Reported-by: kernel test robot <oliver.sang@intel.com> Tested-by: Oliver Sang <oliver.sang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:39 -08:00
Paul E. McKenney	02b51a1cf4	rcutorture: Add end-of-test check to rcu_torture_fwd_prog() loop The second and subsequent forward-progress kthreads loop waiting for the first forward-progress kthread to start the next test interval. Unfortunately, if the test ends while one of those kthreads is waiting, the test will hang. This hang occurs because that wait loop fails to check for the end of the test. This commit therefore adds an end-of-test check to that wait loop. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Paul E. McKenney	e22ef8df41	rcutorture: Make rcu_fwd_cb_nodelay be a counter Back when only one rcutorture kthread could do forward-progress testing, it was just fine for rcu_fwd_cb_nodelay to be a non-atomic bool. It was set at the start of forward-progress testing and cleared at the end. But now that there are multiple threads, the value can be cleared while one of the threads is still doing forward-progress testing. This commit therefore makes rcu_fwd_cb_nodelay be an atomic counter, replacing the WRITE_ONCE() operations with atomic_inc() and atomic_dec(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Paul E. McKenney	05b724655b	rcutorture: Increase visibility of forward-progress hangs This commit adds a few pr_alert() calls to rcutorture's forward-progress testing in order to better diagnose shutdown-time hangs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Paul E. McKenney	2b4a7f20f1	torture: Distinguish kthread stopping and being asked to stop Right now, if a given kthread (call it "kthread") realizes that it needs to stop, "Stopping kthread" is written to the console. When the cleanup code decides that it is time to stop that kthread, "Stopping kthread tasks" is written to the console. These two events might happen in either order, especially in the case of time-based torture-test shutdown. But it is hard to distinguish these, especially for those unfamiliar with the torture tests. This commit therefore changes the first case from "Stopping kthread" to "kthread is stopping" to make things more clear. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Paul E. McKenney	6f81bd6a4e	rcutorture: Print message before invoking ->cb_barrier() The various ->cb_barrier() functions, for example, rcu_barrier(), sometimes cause rcutorture hangs. But currently, the last console message is the unenlightening "Stopping rcu_torture_stats". This commit therefore prints a message of the form "rcu_torture_cleanup: Invoking rcu_barrier+0x0/0x1e0()" to help point people in the right direction. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:24:38 -08:00
Zqiang	c951587585	rcu: Add per-CPU rcuc task dumps to RCU CPU stall warnings When the rcutree.use_softirq kernel boot parameter is set to zero, all RCU_SOFTIRQ processing is carried out by the per-CPU rcuc kthreads. If these kthreads are being starved, quiescent states will not be reported, which in turn means that the grace period will not end, which can in turn trigger RCU CPU stall warnings. This commit therefore dumps stack traces of stalled CPUs' rcuc kthreads, which can help identify what is preventing those kthreads from running. Suggested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Reviewed-by: Ammar Faizi <ammarfaizi2@gnuweeb.org> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:22:17 -08:00
Paul E. McKenney	10c5357874	rcu: Don't deboost before reporting expedited quiescent state Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx before reporting the expedited quiescent state. Under heavy real-time load, this can result in this function being preempted before the quiescent state is reported, which can in turn prevent the expedited grace period from completing. Tim Murray reports that the resulting expedited grace periods can take hundreds of milliseconds and even more than one second, when they should normally complete in less than a millisecond. This was fine given that there were no particular response-time constraints for synchronize_rcu_expedited(), as it was designed for throughput rather than latency. However, some users now need sub-100-millisecond response-time constratints. This patch therefore follows Neeraj's suggestion (seconded by Tim and by Uladzislau Rezki) of simply reversing the two operations. Reported-by: Tim Murray <timmurray@google.com> Reported-by: Joel Fernandes <joelaf@google.com> Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Tim Murray <timmurray@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:41 -08:00
Alison Chaiken	c8b16a6526	rcu: Elevate priority of offloaded callback threads When CONFIG_PREEMPT_RT=y, the rcutree.kthread_prio command-line parameter signals initialization code to boost the priority of rcuc callbacks to the designated value. With the additional CONFIG_RCU_NOCB_CPU=y configuration and an additional rcu_nocbs command-line parameter, the callbacks on the listed cores are offloaded to new rcuop kthreads that are not pinned to the cores whose post-grace-period work is performed. While the rcuop kthreads perform the same function as the rcuc kthreads they offload, the kthread_prio parameter only boosts the priority of the rcuc kthreads. Fix this inconsistency by elevating rcuop kthreads to the same priority as the rcuc kthreads. Signed-off-by: Alison Chaiken <achaiken@aurora.tech> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:25 -08:00
Alison Chaiken	54577e23fa	rcu: Make priority of grace-period thread consistent The priority of RCU grace period threads is set to kthread_prio when they are launched from rcu_spawn_gp_kthread(). The same is not true of rcu_spawn_one_nocb_kthread(). Accordingly, add priority elevation to rcu_spawn_one_nocb_kthread(). Signed-off-by: Alison Chaiken <achaiken@aurora.tech> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:17 -08:00
Alison Chaiken	c8db27dd0e	rcu: Move kthread_prio bounds-check to a separate function Move the bounds-check of the kthread_prio cmdline parameter to a new function in order to faciliate a different callsite. Signed-off-by: Alison Chaiken <achaiken@aurora.tech> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Zqiang	4b4399b245	rcu: Create per-cpu rcuc kthreads only when rcutree.use_softirq=0 The per-CPU "rcuc" kthreads are used only by kernels booted with rcutree.use_softirq=0, but they are nevertheless unconditionally created by kernels built with CONFIG_RCU_BOOST=y. This results in "rcuc" kthreads being created that are never actually used. This commit therefore refrains from creating these kthreads unless the kernel is actually booted with rcutree.use_softirq=0. Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Neeraj Upadhyay	eae9f147a4	rcu: Remove unused rcu_state.boost Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Neeraj Upadhyay	02e3024175	rcu/nocb: Handle concurrent nocb kthreads creation When multiple CPUs in the same nocb gp/cb group concurrently come online, they might try to concurrently create the same rcuog kthread. Fix this by using nocb gp CPU's spawn mutex to provide mutual exclusion for the rcuog kthread creation code. [ paulmck: Whitespace fixes per kernel test robot feedback. ] Acked-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Paul E. McKenney	a47f9f131d	rcu: Mark accesses to boost_starttime The boost_starttime shared variable has conflicting unmarked C-language accesses, which are dangerous at best. This commit therefore adds appropriate marking. This was found by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:19:02 -08:00
Paul E. McKenney	63c564da11	rcu: Mark ->expmask access in synchronize_rcu_expedited_wait() This commit adds a READ_ONCE() to an access to the rcu_node structure's ->expmask field to prevent compiler mischief. Detected by KCSAN. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:05:10 -08:00
Neeraj Upadhyay	4d266c247d	rcu/exp: Fix check for idle context in rcu_exp_handler For PREEMPT_RCU, the rcu_exp_handler() function checks whether the current CPU is in idle, by calling rcu_dynticks_curr_cpu_in_eqs(). However, rcu_exp_handler() is called in IPI handler context. So, it should be checking the idle context using rcu_is_cpu_rrupt_from_idle(). Fix this by using rcu_is_cpu_rrupt_from_idle() instead of rcu_dynticks_curr_cpu_in_eqs(). Non-preempt configuration already uses the correct check. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-02-01 17:05:10 -08:00
Alexei Starovoitov	e96f2d64c8	bpf: Drop libbpf, libelf, libz dependency from bpf preload. Drop libbpf, libelf, libz dependency from bpf preload. This reduces bpf_preload_umd binary size from 1.7M to 30k unstripped with debug info and from 300k to 19k stripped. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220131220528.98088-8-alexei.starovoitov@gmail.com	2022-02-01 23:56:18 +01:00
Alexei Starovoitov	18ef5dac93	bpf: Open code obj_get_info_by_fd in bpf preload. Open code obj_get_info_by_fd in bpf preload. It's the last part of libbpf that preload/iterators were using. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220131220528.98088-7-alexei.starovoitov@gmail.com	2022-02-01 23:56:18 +01:00
Alexei Starovoitov	79b203926d	bpf: Convert bpf preload to light skeleton. Convert bpffs preload iterators to light skeleton. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220131220528.98088-6-alexei.starovoitov@gmail.com	2022-02-01 23:56:18 +01:00
Alexei Starovoitov	1ddbddd706	bpf: Remove unnecessary setrlimit from bpf preload. BPF programs and maps are memcg accounted. setrlimit is obsolete. Remove its use from bpf preload. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220131220528.98088-5-alexei.starovoitov@gmail.com	2022-02-01 23:56:18 +01:00
Linus Torvalds	61fda95541	Merge tag 'audit-pr-20220131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fix from Paul Moore: "A single audit patch to fix problems relating to audit queuing and system responsiveness when "audit=1" is specified on the kernel command line and the audit daemon is SIGSTOP'd for an extended period of time" * tag 'audit-pr-20220131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: improve audit queue handling when "audit=1" on cmdline	2022-02-01 11:07:09 -08:00
Eric W. Biederman	24f6008564	cgroup-v1: Require capabilities to set release_agent The cgroup release_agent is called with call_usermodehelper. The function call_usermodehelper starts the release_agent with a full set fo capabilities. Therefore require capabilities when setting the release_agaent. Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com> Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com> Fixes: `81a6a5cdd2` ("Task Control Groups: automatic userspace notification of idle cgroups") Cc: stable@vger.kernel.org # v2.6.24+ Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Tejun Heo <tj@kernel.org>	2022-02-01 07:28:00 -10:00
Kenta Tada	0407a65f35	bpf: make bpf_copy_from_user_task() gpl only access_process_vm() is exported by EXPORT_SYMBOL_GPL(). Signed-off-by: Kenta Tada <Kenta.Tada@sony.com> Link: https://lore.kernel.org/r/20220128170906.21154-1-Kenta.Tada@sony.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-01-31 12:44:37 -08:00
Yury Norov	1c4cafd115	padata: replace cpumask_weight with cpumask_empty in padata.c padata_do_parallel() calls cpumask_weight() to check if any bit of a given cpumask is set. We can do it more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-01-31 11:21:46 +11:00
Linus Torvalds	27a96c4feb	Merge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Prevent accesses to the per-CPU cgroup context list from another CPU except the one it belongs to, to avoid list corruption - Make sure parent events are always woken up to avoid indefinite hangs in the traced workload * tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix cgroup event list management perf: Always wake the parent event	2022-01-30 15:02:32 +02:00
Linus Torvalds	24f4db1f3a	Merge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: "Make sure the membarrier-rseq fence commands are part of the reported set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY" * tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask	2022-01-30 13:09:00 +02:00
Suren Baghdasaryan	44585f7bc0	psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n When CONFIG_PROC_FS is disabled psi code generates the following warnings: kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=] 1364 \| static const struct proc_ops psi_cpu_proc_ops = { \| ^~~~~~~~~~~~~~~~ kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=] 1355 \| static const struct proc_ops psi_memory_proc_ops = { \| ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=] 1346 \| static const struct proc_ops psi_io_proc_ops = { \| ^~~~~~~~~~~~~~~ Make definitions of these structures and related functions conditional on CONFIG_PROC_FS config. Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com Fixes: `0e94682b73` ("psi: introduce psi monitor") Signed-off-by: Suren Baghdasaryan <surenb@google.com> Reported-by: kernel test robot <lkp@intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-01-30 09:56:58 +02:00
Linus Torvalds	a7b4b0076b	Merge tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These make the buffer handling in pm_show_wakelocks() more robust and drop an unused hibernation-related function. Specifics: - Make the buffer handling in pm_show_wakelocks() more robust by using sysfs_emit_at() in it to generate output (Greg Kroah-Hartman). - Drop register_nosave_region_late() which is not used (Amadeusz Sławiński)" * tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: hibernate: Remove register_nosave_region_late() PM: wakeup: simplify the output logic of pm_show_wakelocks()	2022-01-28 20:44:07 +02:00
Linus Torvalds	df0001545b	Merge tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pulltracing fixes from Steven Rostedt: - Limit mcount build time sorting to only those archs that we know it works for. - Fix memory leak in error path of histogram setup - Fix and clean up rel_loc array out of bounds issue - tools/rtla documentation fixes - Fix issues with histogram logic * tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Don't inc err_log entry count if entry allocation fails tracing: Propagate is_signed to expression tracing: Fix smatch warning for do while check in event_hist_trigger_parse() tracing: Fix smatch warning for null glob in event_hist_trigger_parse() tools/tracing: Update Makefile to build rtla rtla: Make doc build optional tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro tracing: Avoid -Warray-bounds warning for __rel_loc macro tracing/histogram: Fix a potential memory leak for kstrdup() ftrace: Have architectures opt-in for mcount build time sorting	2022-01-28 19:30:35 +02:00
Linus Torvalds	76fcbc9c7c	Merge branch 'ucount-rlimit-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull ucount rlimit fix from Eric Biederman. Make sure the ucounts have a reference to the user namespace it refers to, so that users that themselves don't carry such a reference around can safely use the ucount functions. * 'ucount-rlimit-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: ucount: Make get_ucount a safe get_user replacement	2022-01-28 19:25:24 +02:00

... 27 28 29 30 31 ...

39738 Commits