Commit Graph

38280 Commits

Author SHA1 Message Date
Linus Torvalds
6f35736723 Merge tag 'sched_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
 "Fix a NULL-ptr dereference when recalculating a sched entity's weight"

* tag 'sched_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix fault in reweight_entity
2022-02-13 09:27:26 -08:00
Linus Torvalds
f5e02656b1 Merge tag 'perf_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
 "Prevent cgroup event list corruption when switching events"

* tag 'perf_urgent_for_v5.17_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix list corruption in perf_cgroup_switch()
2022-02-13 09:25:26 -08:00
Linus Torvalds
eef8cffcab Merge tag 'seccomp-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
 "This fixes a corner case of fatal SIGSYS being ignored since v5.15.
  Along with the signal fix is a change to seccomp so that seeing
  another syscall after a fatal filter result will cause seccomp to kill
  the process harder.

  Summary:

   - Force HANDLER_EXIT even for SIGNAL_UNKILLABLE

   - Make seccomp self-destruct after fatal filter results

   - Update seccomp samples for easier behavioral demonstration"

* tag 'seccomp-v5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  samples/seccomp: Adjust sample to also provide kill option
  seccomp: Invalidate seccomp mode to catch death failures
  signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE
2022-02-12 09:04:05 -08:00
Linus Torvalds
883fd0aba1 Merge tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
 "These revert two commits that turned out to be problematic and fix two
  issues related to wakeup from suspend-to-idle on x86.

  Specifics:

   - Revert a recent change that attempted to avoid issues with
     conflicting address ranges during PCI initialization, because it
     turned out to introduce a regression (Hans de Goede).

   - Revert a change that limited EC GPE wakeups from suspend-to-idle to
     systems based on Intel hardware, because it turned out that systems
     based on hardware from other vendors depended on that functionality
     too (Mario Limonciello).

   - Fix two issues related to the handling of wakeup interrupts and
     wakeup events signaled through the EC GPE during suspend-to-idle on
     x86 (Rafael Wysocki)"

* tag 'acpi-5.17-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  x86/PCI: revert "Ignore E820 reservations for bridge windows on newer systems"
  PM: s2idle: ACPI: Fix wakeup interrupts handling
  ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE
  ACPI: PM: Revert "Only mark EC GPE for wakeup on Intel systems"
2022-02-11 11:48:13 -08:00
Linus Torvalds
32f6c5d037 Merge tag 'trace-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:

 - Fixes to the RTLA tooling

 - A fix to a tp_printk overriding tp_printk_stop_on_boot on the
   command line

* tag 'trace-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix tp_printk option related with tp_printk_stop_on_boot
  MAINTAINERS: Add RTLA entry
  rtla: Fix segmentation fault when failing to enable -t
  rtla/trace: Error message fixup
  rtla/utils: Fix session duration parsing
  rtla: Follow kernel version
2022-02-11 10:22:48 -08:00
Kees Cook
495ac3069a seccomp: Invalidate seccomp mode to catch death failures
If seccomp tries to kill a process, it should never see that process
again. To enforce this proactively, switch the mode to something
impossible. If encountered: WARN, reject all syscalls, and attempt to
kill the process again even harder.

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Fixes: 8112c4f140 ("seccomp: remove 2-phase API")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
2022-02-10 19:09:12 -08:00
Kees Cook
5c72263ef2 signal: HANDLER_EXIT should clear SIGNAL_UNKILLABLE
Fatal SIGSYS signals (i.e. seccomp RET_KILL_* syscall filter actions)
were not being delivered to ptraced pid namespace init processes. Make
sure the SIGNAL_UNKILLABLE doesn't get set for these cases.

Reported-by: Robert Święcki <robert@swiecki.net>
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Fixes: 00b06da29c ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Link: https://lore.kernel.org/lkml/878rui8u4a.fsf@email.froward.int.ebiederm.org
2022-02-10 19:08:54 -08:00
Linus Torvalds
252787201e Merge tag 'audit-pr-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
 "Another audit fix, this time a single rather small but important fix
  for an oops/page-fault caused by improperly accessing userspace
  memory"

* tag 'audit-pr-20220209' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: don't deref the syscall args when checking the openat2 open_how::flags
2022-02-10 05:43:43 -08:00
Paul Moore
7a82f89de9 audit: don't deref the syscall args when checking the openat2 open_how::flags
As reported by Jeff, dereferencing the openat2 syscall argument in
audit_match_perm() to obtain the open_how::flags can result in an
oops/page-fault.  This patch fixes this by using the open_how struct
that we store in the audit_context with audit_openat2_how().

Independent of this patch, Richard Guy Briggs posted a similar patch
to the audit mailing list roughly 40 minutes after this patch was
posted.

Cc: stable@vger.kernel.org
Fixes: 1c30e3af8a ("audit: add support for the openat2 syscall")
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-02-09 16:04:26 -05:00
JaeSang Yoo
3203ce39ac tracing: Fix tp_printk option related with tp_printk_stop_on_boot
The kernel parameter "tp_printk_stop_on_boot" starts with "tp_printk" which is
the same as another kernel parameter "tp_printk". If "tp_printk" setup is
called before the "tp_printk_stop_on_boot", it will override the latter
and keep it from being set.

This is similar to other kernel parameter issues, such as:
  Commit 745a600cf1 ("um: console: Ignore console= option")
or init/do_mounts.c:45 (setup function of "ro" kernel param)

Fix it by checking for a "_" right after the "tp_printk" and if that
exists do not process the parameter.

Link: https://lkml.kernel.org/r/20220208195421.969326-1-jsyoo5b@gmail.com

Signed-off-by: JaeSang Yoo <jsyoo5b@gmail.com>
[ Fixed up change log and added space after if condition ]
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-02-08 15:47:00 -05:00
Rafael J. Wysocki
cb1f65c1e1 PM: s2idle: ACPI: Fix wakeup interrupts handling
After commit e3728b50cd ("ACPI: PM: s2idle: Avoid possible race
related to the EC GPE") wakeup interrupts occurring immediately after
the one discarded by acpi_s2idle_wake() may be missed.  Moreover, if
the SCI triggers again immediately after the rearming in
acpi_s2idle_wake(), that wakeup may be missed too.

The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
when pm_wakeup_irq is 0, but that's not the case any more after the
interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
cleared by the pm_wakeup_clear() call in s2idle_loop().  However,
there may be wakeup interrupts occurring in that time frame and if
that happens, they will be missed.

To address that issue first move the clearing of pm_wakeup_irq to
the point at which it is known that the interrupt causing
acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
for wakeup.  Moreover, because that only reduces the size of the
time window in which the issue may manifest itself, allow
pm_system_irq_wakeup() to register two second wakeup interrupts in
a row and, when discarding the first one, replace it with the second
one.  [Of course, this assumes that only one wakeup interrupt can be
discarded in one go, but currently that is the case and I am not
aware of any plans to change that.]

Fixes: e3728b50cd ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-07 21:02:31 +01:00
Song Liu
5f4e5ce638 perf: Fix list corruption in perf_cgroup_switch()
There's list corruption on cgrp_cpuctx_list. This happens on the
following path:

  perf_cgroup_switch: list_for_each_entry(cgrp_cpuctx_list)
      cpu_ctx_sched_in
         ctx_sched_in
            ctx_pinned_sched_in
              merge_sched_in
                  perf_cgroup_event_disable: remove the event from the list

Use list_for_each_entry_safe() to allow removing an entry during
iteration.

Fixes: 058fe1c044 ("perf/core: Make cgroup switch visit only cpuctxs with cgroup events")
Signed-off-by: Song Liu <song@kernel.org>
Reviewed-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220204004057.2961252-1-song@kernel.org
2022-02-06 22:37:27 +01:00
Tadeusz Struk
13765de814 sched/fair: Fix fault in reweight_entity
Syzbot found a GPF in reweight_entity. This has been bisected to
commit 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid
sched_task_group")

There is a race between sched_post_fork() and setpriority(PRIO_PGRP)
within a thread group that causes a null-ptr-deref in
reweight_entity() in CFS. The scenario is that the main process spawns
number of new threads, which then call setpriority(PRIO_PGRP, 0, -20),
wait, and exit.  For each of the new threads the copy_process() gets
invoked, which adds the new task_struct and calls sched_post_fork()
for it.

In the above scenario there is a possibility that
setpriority(PRIO_PGRP) and set_one_prio() will be called for a thread
in the group that is just being created by copy_process(), and for
which the sched_post_fork() has not been executed yet. This will
trigger a null pointer dereference in reweight_entity(), as it will
try to access the run queue pointer, which hasn't been set.

Before the mentioned change the cfs_rq pointer for the task  has been
set in sched_fork(), which is called much earlier in copy_process(),
before the new task is added to the thread_group.  Now it is done in
the sched_post_fork(), which is called after that.  To fix the issue
the remove the update_load param from the update_load param() function
and call reweight_task() only if the task flag doesn't have the
TASK_NEW flag set.

Fixes: 4ef0c5c6b5 ("kernel/sched: Fix sched_fork() access an invalid sched_task_group")
Reported-by: syzbot+af7a719bc92395ee41b3@syzkaller.appspotmail.com
Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20220203161846.1160750-1-tadeusz.struk@linaro.org
2022-02-06 22:37:26 +01:00
Linus Torvalds
c3bf8a1440 Merge tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Intel/PT: filters could crash the kernel

 - Intel: default disable the PMU for SMM, some new-ish EFI firmware has
   started using CPL3 and the PMU CPL filters don't discriminate against
   SMM, meaning that CPL3 (userspace only) events now also count EFI/SMM
   cycles.

 - Fixup for perf_event_attr::sig_data

* tag 'perf_urgent_for_v5.17_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/pt: Fix crash with stop filters in single-range mode
  perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures
  selftests/perf_events: Test modification of perf_event_attr::sig_data
  perf: Copy perf_event_attr::sig_data on modification
  x86/perf: Default set FREEZE_ON_SMI for all
2022-02-06 10:11:14 -08:00
Kees Cook
dcb85f85fa gcc-plugins/stackleak: Use noinstr in favor of notrace
While the stackleak plugin was already using notrace, objtool is now a
bit more picky.  Update the notrace uses to noinstr.  Silences the
following objtool warnings when building with:

CONFIG_DEBUG_ENTRY=y
CONFIG_STACK_VALIDATION=y
CONFIG_VMLINUX_VALIDATION=y
CONFIG_GCC_PLUGIN_STACKLEAK=y

  vmlinux.o: warning: objtool: do_syscall_64()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: do_int80_syscall_32()+0x9: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: exc_general_protection()+0x22: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: fixup_bad_iret()+0x20: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: do_machine_check()+0x27: call to stackleak_track_stack() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .text+0x5346e: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x143: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x10eb: call to stackleak_erase() leaves .noinstr.text section
  vmlinux.o: warning: objtool: .entry.text+0x17f9: call to stackleak_erase() leaves .noinstr.text section

Note that the plugin's addition of calls to stackleak_track_stack() from
noinstr functions is expected to be safe, as it isn't runtime
instrumentation and is self-contained.

Cc: Alexander Popov <alex.popov@linux.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 17:02:21 -08:00
Linus Torvalds
eb2eb5161c Merge tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from bpf, netfilter, and ieee802154.

  Current release - regressions:

   - Partially revert "net/smc: Add netlink net namespace support", fix
     uABI breakage

   - netfilter:
      - nft_ct: fix use after free when attaching zone template
      - nft_byteorder: track register operations

  Previous releases - regressions:

   - ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback

   - phy: qca8081: fix speeds lower than 2.5Gb/s

   - sched: fix use-after-free in tc_new_tfilter()

  Previous releases - always broken:

   - tcp: fix mem under-charging with zerocopy sendmsg()

   - tcp: add missing tcp_skb_can_collapse() test in
     tcp_shift_skb_data()

   - neigh: do not trigger immediate probes on NUD_FAILED from
     neigh_managed_work, avoid a deadlock

   - bpf: use VM_MAP instead of VM_ALLOC for ringbuf, avoid KASAN
     false-positives

   - netfilter: nft_reject_bridge: fix for missing reply from prerouting

   - smc: forward wakeup to smc socket waitqueue after fallback

   - ieee802154:
      - return meaningful error codes from the netlink helpers
      - mcr20a: fix lifs/sifs periods
      - at86rf230, ca8210: stop leaking skbs on error paths

   - macsec: add missing un-offload call for NETDEV_UNREGISTER of parent

   - ax25: add refcount in ax25_dev to avoid UAF bugs

   - eth: mlx5e:
      - fix SFP module EEPROM query
      - fix broken SKB allocation in HW-GRO
      - IPsec offload: fix tunnel mode crypto for non-TCP/UDP flows

   - eth: amd-xgbe:
      - fix skb data length underflow
      - ensure reset of the tx_timer_active flag, avoid Tx timeouts

   - eth: stmmac: fix runtime pm use in stmmac_dvr_remove()

   - eth: e1000e: handshake with CSME starts from Alder Lake platforms"

* tag 'net-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  ax25: fix reference count leaks of ax25_dev
  net: stmmac: ensure PTP time register reads are consistent
  net: ipa: request IPA register values be retained
  dt-bindings: net: qcom,ipa: add optional qcom,qmp property
  tools/resolve_btfids: Do not print any commands when building silently
  bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
  net, neigh: Do not trigger immediate probes on NUD_FAILED from neigh_managed_work
  tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data()
  net: sparx5: do not refer to skb after passing it on
  Partially revert "net/smc: Add netlink net namespace support"
  net/mlx5e: Avoid field-overflowing memcpy()
  net/mlx5e: Use struct_group() for memcpy() region
  net/mlx5e: Avoid implicit modify hdr for decap drop rule
  net/mlx5e: IPsec: Fix tunnel mode crypto offload for non TCP/UDP traffic
  net/mlx5e: IPsec: Fix crypto offload for non TCP/UDP encapsulated traffic
  net/mlx5e: Don't treat small ceil values as unlimited in HTB offload
  net/mlx5: E-Switch, Fix uninitialized variable modact
  net/mlx5e: Fix handling of wrong devices during bond netevent
  net/mlx5e: Fix broken SKB allocation in HW-GRO
  net/mlx5e: Fix wrong calculation of header index in HW_GRO
  ...
2022-02-03 16:54:18 -08:00
Jakub Kicinski
77b1b8b43e Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2022-02-03

We've added 6 non-merge commits during the last 10 day(s) which contain
a total of 7 files changed, 11 insertions(+), 236 deletions(-).

The main changes are:

1) Fix BPF ringbuf to allocate its area with VM_MAP instead of VM_ALLOC
   flag which otherwise trips over KASAN, from Hou Tao.

2) Fix unresolved symbol warning in resolve_btfids due to LSM callback
   rename, from Alexei Starovoitov.

3) Fix a possible race in inc_misses_counter() when IRQ would trigger
   during counter update, from He Fengqing.

4) Fix tooling infra for cross-building with clang upon probing whether
   gcc provides the standard libraries, from Jean-Philippe Brucker.

5) Fix silent mode build for resolve_btfids, from Nathan Chancellor.

6) Drop unneeded and outdated lirc.h header copy from tooling infra as
   BPF does not require it anymore, from Sean Young.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  tools/resolve_btfids: Do not print any commands when building silently
  bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
  tools: Ignore errors from `which' when searching a GCC toolchain
  tools headers UAPI: remove stale lirc.h
  bpf: Fix possible race in inc_misses_counter
  bpf: Fix renaming task_getsecid_subj->current_getsecid_subj.
====================

Link: https://lore.kernel.org/r/20220203155815.25689-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-03 13:42:38 -08:00
Mickaël Salaün
1f2cfdd349 printk: Fix incorrect __user type in proc_dointvec_minmax_sysadmin()
The move of proc_dointvec_minmax_sysadmin() from kernel/sysctl.c to
kernel/printk/sysctl.c introduced an incorrect __user attribute to the
buffer argument.  I spotted this change in [1] as well as the kernel
test robot.  Revert this change to please sparse:

  kernel/printk/sysctl.c:20:51: warning: incorrect type in argument 3 (different address spaces)
  kernel/printk/sysctl.c:20:51:    expected void *
  kernel/printk/sysctl.c:20:51:    got void [noderef] __user *buffer

Fixes: faaa357a55 ("printk: move printk sysctl to printk/sysctl.c")
Link: https://lore.kernel.org/r/20220104155024.48023-2-mic@digikod.net [1]
Reported-by: kernel test robot <lkp@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Mickaël Salaün <mic@linux.microsoft.com>
Link: https://lore.kernel.org/r/20220203145029.272640-1-mic@digikod.net
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 11:27:38 -08:00
Igor Pylypiv
67d6212afd Revert "module, async: async_synchronize_full() on module init iff async is used"
This reverts commit 774a1221e8.

We need to finish all async code before the module init sequence is
done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
thread that called async_schedule().  Then the PF_USED_ASYNC flag was
used to determine whether or not async_synchronize_full() needs to be
invoked.  This works when modprobe thread is calling async_schedule(),
but it does not work if module dispatches init code to a worker thread
which then calls async_schedule().

For example, PCI driver probing is invoked from a worker thread based on
a node where device is attached:

	if (cpu < nr_cpu_ids)
		error = work_on_cpu(cpu, local_pci_probe, &ddi);
	else
		error = local_pci_probe(&ddi);

We end up in a situation where a worker thread gets the PF_USED_ASYNC
flag set instead of the modprobe thread.  As a result,
async_synchronize_full() is not invoked and modprobe completes without
waiting for the async code to finish.

The issue was discovered while loading the pm80xx driver:
(scsi_mod.scan=async)

modprobe pm80xx                      worker
...
  do_init_module()
  ...
    pci_call_probe()
      work_on_cpu(local_pci_probe)
                                     local_pci_probe()
                                       pm8001_pci_probe()
                                         scsi_scan_host()
                                           async_schedule()
                                           worker->flags |= PF_USED_ASYNC;
                                     ...
      < return from worker >
  ...
  if (current->flags & PF_USED_ASYNC) <--- false
  	async_synchronize_full();

Commit 21c3c5d280 ("block: don't request module during elevator init")
fixed the deadlock issue which the reverted commit 774a1221e8
("module, async: async_synchronize_full() on module init iff async is
used") tried to fix.

Since commit 0fdff3ec6d ("async, kmod: warn on synchronous
request_module() from async workers") synchronous module loading from
async is not allowed.

Given that the original deadlock issue is fixed and it is no longer
allowed to call synchronous request_module() from async we can remove
PF_USED_ASYNC flag to make module init consistently invoke
async_synchronize_full() unless async module probe is requested.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Changyuan Lyu <changyuanl@google.com>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-02-03 11:20:34 -08:00
Linus Torvalds
305e6c42e8 Merge branch 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:

 - Eric's fix for a long standing cgroup1 permission issue where it only
   checks for uid 0 instead of CAP which inadvertently allows
   unprivileged userns roots to modify release_agent userhelper

 - Fixes for the fallout from Waiman's recent cpuset work

* 'for-5.17-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
  cgroup-v1: Require capabilities to set release_agent
  cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
  cgroup/cpuset: Make child cpusets restrict parents on v1 hierarchy
2022-02-03 08:15:13 -08:00
Waiman Long
2bdfd2825c cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning
It was found that a "suspicious RCU usage" lockdep warning was issued
with the rcu_read_lock() call in update_sibling_cpumasks().  It is
because the update_cpumasks_hier() function may sleep. So we have
to release the RCU lock, call update_cpumasks_hier() and reacquire
it afterward.

Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks()
instead of stating that in the comment.

Fixes: 4716909cc5 ("cpuset: Track cpusets that use parent's effective_cpus")
Signed-off-by: Waiman Long <longman@redhat.com>
Tested-by: Phil Auld <pauld@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-03 05:59:01 -10:00
Hou Tao
b293dcc473 bpf: Use VM_MAP instead of VM_ALLOC for ringbuf
After commit 2fd3fb0be1d1 ("kasan, vmalloc: unpoison VM_ALLOC pages
after mapping"), non-VM_ALLOC mappings will be marked as accessible
in __get_vm_area_node() when KASAN is enabled. But now the flag for
ringbuf area is VM_ALLOC, so KASAN will complain out-of-bound access
after vmap() returns. Because the ringbuf area is created by mapping
allocated pages, so use VM_MAP instead.

After the change, info in /proc/vmallocinfo also changes from
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmalloc user
to
  [start]-[end]   24576 ringbuf_map_alloc+0x171/0x290 vmap user

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: syzbot+5ad567a418794b9b5983@syzkaller.appspotmail.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220202060158.6260-1-houtao1@huawei.com
2022-02-02 23:15:24 -08:00
Marco Elver
3c25fc97f5 perf: Copy perf_event_attr::sig_data on modification
The intent has always been that perf_event_attr::sig_data should also be
modifiable along with PERF_EVENT_IOC_MODIFY_ATTRIBUTES, because it is
observable by user space if SIGTRAP on events is requested.

Currently only PERF_TYPE_BREAKPOINT is modifiable, and explicitly copies
relevant breakpoint-related attributes in hw_breakpoint_copy_attr().
This misses copying perf_event_attr::sig_data.

Since sig_data is not specific to PERF_TYPE_BREAKPOINT, introduce a
helper to copy generic event-type-independent attributes on
modification.

Fixes: 97ba62b278 ("perf: Add support for SIGTRAP on perf events")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Link: https://lore.kernel.org/r/20220131103407.1971678-1-elver@google.com
2022-02-02 13:11:40 +01:00
Linus Torvalds
61fda95541 Merge tag 'audit-pr-20220131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
 "A single audit patch to fix problems relating to audit queuing and
  system responsiveness when "audit=1" is specified on the kernel
  command line and the audit daemon is SIGSTOP'd for an extended period
  of time"

* tag 'audit-pr-20220131' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: improve audit queue handling when "audit=1" on cmdline
2022-02-01 11:07:09 -08:00
Eric W. Biederman
24f6008564 cgroup-v1: Require capabilities to set release_agent
The cgroup release_agent is called with call_usermodehelper.  The function
call_usermodehelper starts the release_agent with a full set fo capabilities.
Therefore require capabilities when setting the release_agaent.

Reported-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Tested-by: Tabitha Sable <tabitha.c.sable@gmail.com>
Fixes: 81a6a5cdd2 ("Task Control Groups: automatic userspace notification of idle cgroups")
Cc: stable@vger.kernel.org # v2.6.24+
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-02-01 07:28:00 -10:00
Linus Torvalds
27a96c4feb Merge tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:

 - Prevent accesses to the per-CPU cgroup context list from another CPU
   except the one it belongs to, to avoid list corruption

 - Make sure parent events are always woken up to avoid indefinite hangs
   in the traced workload

* tag 'perf_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix cgroup event list management
  perf: Always wake the parent event
2022-01-30 15:02:32 +02:00
Linus Torvalds
24f4db1f3a Merge tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Borislav Petkov:
 "Make sure the membarrier-rseq fence commands are part of the reported
  set when querying membarrier(2) commands through MEMBARRIER_CMD_QUERY"

* tag 'sched_urgent_for_v5.17_rc2_p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
2022-01-30 13:09:00 +02:00
Suren Baghdasaryan
44585f7bc0 psi: fix "defined but not used" warnings when CONFIG_PROC_FS=n
When CONFIG_PROC_FS is disabled psi code generates the following
warnings:

  kernel/sched/psi.c:1364:30: warning: 'psi_cpu_proc_ops' defined but not used [-Wunused-const-variable=]
      1364 | static const struct proc_ops psi_cpu_proc_ops = {
           |                              ^~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1355:30: warning: 'psi_memory_proc_ops' defined but not used [-Wunused-const-variable=]
      1355 | static const struct proc_ops psi_memory_proc_ops = {
           |                              ^~~~~~~~~~~~~~~~~~~
  kernel/sched/psi.c:1346:30: warning: 'psi_io_proc_ops' defined but not used [-Wunused-const-variable=]
      1346 | static const struct proc_ops psi_io_proc_ops = {
           |                              ^~~~~~~~~~~~~~~

Make definitions of these structures and related functions conditional
on CONFIG_PROC_FS config.

Link: https://lkml.kernel.org/r/20220119223940.787748-3-surenb@google.com
Fixes: 0e94682b73 ("psi: introduce psi monitor")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-30 09:56:58 +02:00
Linus Torvalds
a7b4b0076b Merge tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
 "These make the buffer handling in pm_show_wakelocks() more robust and
  drop an unused hibernation-related function.

  Specifics:

   - Make the buffer handling in pm_show_wakelocks() more robust by
     using sysfs_emit_at() in it to generate output (Greg
     Kroah-Hartman).

   - Drop register_nosave_region_late() which is not used (Amadeusz
     Sławiński)"

* tag 'pm-5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: hibernate: Remove register_nosave_region_late()
  PM: wakeup: simplify the output logic of pm_show_wakelocks()
2022-01-28 20:44:07 +02:00
Linus Torvalds
df0001545b Merge tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pulltracing fixes from Steven Rostedt:

 - Limit mcount build time sorting to only those archs that we know it
   works for.

 - Fix memory leak in error path of histogram setup

 - Fix and clean up rel_loc array out of bounds issue

 - tools/rtla documentation fixes

 - Fix issues with histogram logic

* tag 'trace-v5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Don't inc err_log entry count if entry allocation fails
  tracing: Propagate is_signed to expression
  tracing: Fix smatch warning for do while check in event_hist_trigger_parse()
  tracing: Fix smatch warning for null glob in event_hist_trigger_parse()
  tools/tracing: Update Makefile to build rtla
  rtla: Make doc build optional
  tracing/perf: Avoid -Warray-bounds warning for __rel_loc macro
  tracing: Avoid -Warray-bounds warning for __rel_loc macro
  tracing/histogram: Fix a potential memory leak for kstrdup()
  ftrace: Have architectures opt-in for mcount build time sorting
2022-01-28 19:30:35 +02:00
Linus Torvalds
76fcbc9c7c Merge branch 'ucount-rlimit-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull ucount rlimit fix from Eric Biederman.

Make sure the ucounts have a reference to the user namespace it refers
to, so that users that themselves don't carry such a reference around
can safely use the ucount functions.

* 'ucount-rlimit-fixes-for-v5.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  ucount:  Make get_ucount a safe get_user replacement
2022-01-28 19:25:24 +02:00
Linus Torvalds
a773abf72e Merge tag 'rcu-urgent.2022.01.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU fix from Paul McKenney:
 "This fixes a brown-paper-bag bug in RCU tasks that causes things like
  BPF and ftrace to fail miserably on systems with non-power-of-two
  numbers of CPUs.

  It fixes a math error added in 7a30871b6a ("rcu-tasks: Introduce
  ->percpu_enqueue_shift for dynamic queue selection') during the v5.17
  merge window. This commit works correctly only on systems with a
  power-of-two number of CPUs, which just so happens to be the kind that
  rcutorture always uses by default.

  This pull request fixes the math so that things also work on systems
  that don't happen to have a power-of-two number of CPUs"

* tag 'rcu-urgent.2022.01.26a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
  rcu-tasks: Fix computation of CPU-to-list shift counts
2022-01-28 19:19:22 +02:00
Tom Zanussi
67ab5eb71b tracing: Don't inc err_log entry count if entry allocation fails
tr->n_err_log_entries should only be increased if entry allocation
succeeds.

Doing it when it fails won't cause any problems other than wasting an
entry, but should be fixed anyway.

Link: https://lkml.kernel.org/r/cad1ab28f75968db0f466925e7cba5970cec6c29.1643319703.git.zanussi@kernel.org

Cc: stable@vger.kernel.org
Fixes: 2f754e771b ("tracing: Don't inc err_log entry count if entry allocation fails")
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:49 -05:00
Tom Zanussi
097f1eefed tracing: Propagate is_signed to expression
During expression parsing, a new expression field is created which
should inherit the properties of the operands, such as size and
is_signed.

is_signed propagation was missing, causing spurious errors with signed
operands.  Add it in parse_expr() and parse_unary() to fix the problem.

Link: https://lkml.kernel.org/r/f4dac08742fd7a0920bf80a73c6c44042f5eaa40.1643319703.git.zanussi@kernel.org

Cc: stable@vger.kernel.org
Fixes: 100719dcef ("tracing: Add simple expression support to hist triggers")
Reported-by: Yordan Karadzhov <ykaradzhov@vmware.com>
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215513
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:49 -05:00
Tom Zanussi
b59f2f2b86 tracing: Fix smatch warning for do while check in event_hist_trigger_parse()
The patch ec5ce09875: "tracing: Allow whitespace to surround hist
trigger filter" from Jan 15, 2018, leads to the following Smatch
static checker warning:

    kernel/trace/trace_events_hist.c:6199 event_hist_trigger_parse()
    warn: 'p' can't be NULL.

Since p is always checked for a NULL value at the top of loop and
nothing in the rest of the loop will set it to NULL, the warning
is correct and might as well be 1 to silence the warning.

Link: https://lkml.kernel.org/r/a1d4c79766c0cf61e20438dc35244d216633fef6.1643319703.git.zanussi@kernel.org

Fixes: ec5ce09875 ("tracing: Allow whitespace to surround hist trigger filter")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:48 -05:00
Tom Zanussi
798a5b6c19 tracing: Fix smatch warning for null glob in event_hist_trigger_parse()
The recent rename of event_hist_trigger_parse() caused smatch
re-evaluation of trace_events_hist.c and as a result an old warning
was found:

    kernel/trace/trace_events_hist.c:6174 event_hist_trigger_parse()
    error: we previously assumed 'glob' could be null (see line 6166)

glob should never be null (and apparently smatch can also figure that
out and skip the warning when using the cross-function DB (but which
can't be used with a 0day build as it takes too much time to
generate)).

Nonetheless for clarity, remove the test but add a WARN_ON() in case
the code ever changes.

Link: https://lkml.kernel.org/r/96925e5c1f116654ada7ea0613d930b1266b5e1c.1643319703.git.zanussi@kernel.org

Fixes: f404da6e1d ("tracing: Add 'last error' error facility for hist triggers")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:47 -05:00
Xiaoke Wang
e629e7b525 tracing/histogram: Fix a potential memory leak for kstrdup()
kfree() is missing on an error path to free the memory allocated by
kstrdup():

  p = param = kstrdup(data->params[i], GFP_KERNEL);

So it is better to free it via kfree(p).

Link: https://lkml.kernel.org/r/tencent_C52895FD37802832A3E5B272D05008866F0A@qq.com

Cc: stable@vger.kernel.org
Fixes: d380dcde9a ("tracing: Fix now invalid var_ref_vals assumption in trace action")
Signed-off-by: Xiaoke Wang <xkernel.wang@foxmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:44 -05:00
Steven Rostedt (Google)
4ed308c445 ftrace: Have architectures opt-in for mcount build time sorting
First S390 complained that the sorting of the mcount sections at build
time caused the kernel to crash on their architecture. Now PowerPC is
complaining about it too. And also ARM64 appears to be having issues.

It may be necessary to also update the relocation table for the values
in the mcount table. Not only do we have to sort the table, but also
update the relocations that may be applied to the items in the table.

If the system is not relocatable, then it is fine to sort, but if it is,
some architectures may have issues (although x86 does not as it shifts all
addresses the same).

Add a HAVE_BUILDTIME_MCOUNT_SORT that an architecture can set to say it is
safe to do the sorting at build time.

Also update the config to compile in build time sorting in the sorttable
code in scripts/ to depend on CONFIG_BUILDTIME_MCOUNT_SORT.

Link: https://lore.kernel.org/all/944D10DA-8200-4BA9-8D0A-3BED9AA99F82@linux.ibm.com/
Link: https://lkml.kernel.org/r/20220127153821.3bc1ac6e@gandalf.local.home

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Yinan Liu <yinan@linux.alibaba.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Reported-by: Sachin Sant <sachinp@linux.ibm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Tested-by: Sachin Sant <sachinp@linux.ibm.com>
Fixes: 72b3942a17 ("scripts: ftrace - move the sort-processing in ftrace_init")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-27 19:15:44 -05:00
Eric W. Biederman
f9d87929d4 ucount: Make get_ucount a safe get_user replacement
When the ucount code was refactored to create get_ucount it was missed
that some of the contexts in which a rlimit is kept elevated can be
the only reference to the user/ucount in the system.

Ordinary ucount references exist in places that also have a reference
to the user namspace, but in POSIX message queues, the SysV shm code,
and the SIGPENDING code there is no independent user namespace
reference.

Inspection of the the user_namespace show no instance of circular
references between struct ucounts and the user_namespace.  So
hold a reference from struct ucount to i's user_namespace to
resolve this problem.

Link: https://lore.kernel.org/lkml/YZV7Z+yXbsx9p3JN@fixkernel.com/
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Reported-by: Mathias Krause <minipli@grsecurity.net>
Tested-by: Mathias Krause <minipli@grsecurity.net>
Reviewed-by: Mathias Krause <minipli@grsecurity.net>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Fixes: d646969055 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Fixes: 6e52a9f053 ("Reimplement RLIMIT_MSGQUEUE on top of ucounts")
Fixes: d7c9e99aee ("Reimplement RLIMIT_MEMLOCK on top of ucounts")
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-26 18:34:11 -06:00
Paul E. McKenney
da123016ca rcu-tasks: Fix computation of CPU-to-list shift counts
The ->percpu_enqueue_shift field is used to map from the running CPU
number to the index of the corresponding callback list.  This mapping
can change at runtime in response to varying callback load, resulting
in varying levels of contention on the callback-list locks.

Unfortunately, the initial value of this field is correct only if the
system happens to have a power-of-two number of CPUs, otherwise the
callbacks from the high-numbered CPUs can be placed into the callback list
indexed by 1 (rather than 0), and those index-1 callbacks will be ignored.
This can result in soft lockups and hangs.

This commit therefore corrects this mapping, adding one to this shift
count as needed for systems having odd numbers of CPUs.

Fixes: 7a30871b6a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Reported-by: Martin Lau <kafai@fb.com>
Cc: Neeraj Upadhyay <neeraj.iitr10@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
2022-01-26 13:04:05 -08:00
Tianchen Ding
c80d401c52 cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask()
subparts_cpus should be limited as a subset of cpus_allowed, but it is
updated wrongly by using cpumask_andnot(). Use cpumask_and() instead to
fix it.

Fixes: ee8dde0cd2 ("cpuset: Add new v2 cpuset.sched.partition flag")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2022-01-26 06:49:30 -10:00
Namhyung Kim
c5de60cd62 perf/core: Fix cgroup event list management
The active cgroup events are managed in the per-cpu cgrp_cpuctx_list.
This list is only accessed from current cpu and not protected by any
locks.  But from the commit ef54c1a476 ("perf: Rework
perf_event_exit_event()"), it's possible to access (actually modify)
the list from another cpu.

In the perf_remove_from_context(), it can remove an event from the
context without an IPI when the context is not active.  This is not
safe with cgroup events which can have some active events in the
context even if ctx->is_active is 0 at the moment.  The target cpu
might be in the middle of list iteration at the same time.

If the event is enabled when it's about to be closed, it might call
perf_cgroup_event_disable() and list_del() with the cgrp_cpuctx_list
on a different cpu.

This resulted in a crash due to an invalid list pointer access during
the cgroup list traversal on the cpu which the event belongs to.

Let's fallback to IPI to access the cgrp_cpuctx_list from that cpu.
Similarly, perf_install_in_context() should use IPI for the cgroup
events too.

Fixes: ef54c1a476 ("perf: Rework perf_event_exit_event()")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220124195808.2252071-1-namhyung@kernel.org
2022-01-26 15:06:06 +01:00
James Clark
961c391217 perf: Always wake the parent event
When using per-process mode and event inheritance is set to true,
forked processes will create a new perf events via inherit_event() ->
perf_event_alloc(). But these events will not have ring buffers
assigned to them. Any call to wakeup will be dropped if it's called on
an event with no ring buffer assigned because that's the object that
holds the wakeup list.

If the child event is disabled due to a call to
perf_aux_output_begin() or perf_aux_output_end(), the wakeup is
dropped leaving userspace hanging forever on the poll.

Normally the event is explicitly re-enabled by userspace after it
wakes up to read the aux data, but in this case it does not get woken
up so the event remains disabled.

This can be reproduced when using Arm SPE and 'stress' which forks once
before running the workload. By looking at the list of aux buffers read,
it's apparent that they stop after the fork:

  perf record -e arm_spe// -vvv -- stress -c 1

With this patch applied they continue to be printed. This behaviour
doesn't happen when using systemwide or per-cpu mode.

Reported-by: Ruben Ayrapetyan <Ruben.Ayrapetyan@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20211206113840.130802-2-james.clark@arm.com
2022-01-26 15:06:06 +01:00
He Fengqing
0e3135d3bf bpf: Fix possible race in inc_misses_counter
It seems inc_misses_counter() suffers from same issue fixed in
the commit d979617aa8 ("bpf: Fixes possible race in update_prog_stats()
for 32bit arches"):
As it can run while interrupts are enabled, it could
be re-entered and the u64_stats syncp could be mangled.

Fixes: 9ed9e9ba23 ("bpf: Count the number of times recursion was prevented")
Signed-off-by: He Fengqing <hefengqing@huawei.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/r/20220122102936.1219518-1-hefengqing@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-25 17:50:03 -08:00
Mathieu Desnoyers
809232619f sched/membarrier: Fix membarrier-rseq fence command missing from query bitmask
The membarrier command MEMBARRIER_CMD_QUERY allows querying the
available membarrier commands. When the membarrier-rseq fence commands
were added, a new MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK was
introduced with the intent to expose them with the MEMBARRIER_CMD_QUERY
command, the but it was never added to MEMBARRIER_CMD_BITMASK.

The membarrier-rseq fence commands are therefore not wired up with the
query command.

Rename MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK to
MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK (the bitmask is not a command
per-se), and change the erroneous
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK (which does not
actually exist) to MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ.

Wire up MEMBARRIER_PRIVATE_EXPEDITED_RSEQ_BITMASK in
MEMBARRIER_CMD_BITMASK. Fixing this allows discovering availability of
the membarrier-rseq fence feature.

Fixes: 2a36ab717e ("rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # 5.10+
Link: https://lkml.kernel.org/r/20220117203010.30129-1-mathieu.desnoyers@efficios.com
2022-01-25 22:30:25 +01:00
Paul Moore
f26d043313 audit: improve audit queue handling when "audit=1" on cmdline
When an admin enables audit at early boot via the "audit=1" kernel
command line the audit queue behavior is slightly different; the
audit subsystem goes to greater lengths to avoid dropping records,
which unfortunately can result in problems when the audit daemon is
forcibly stopped for an extended period of time.

This patch makes a number of changes designed to improve the audit
queuing behavior so that leaving the audit daemon in a stopped state
for an extended period does not cause a significant impact to the
system.

- kauditd_send_queue() is now limited to looping through the
  passed queue only once per call.  This not only prevents the
  function from looping indefinitely when records are returned
  to the current queue, it also allows any recovery handling in
  kauditd_thread() to take place when kauditd_send_queue()
  returns.

- Transient netlink send errors seen as -EAGAIN now cause the
  record to be returned to the retry queue instead of going to
  the hold queue.  The intention of the hold queue is to store,
  perhaps for an extended period of time, the events which led
  up to the audit daemon going offline.  The retry queue remains
  a temporary queue intended to protect against transient issues
  between the kernel and the audit daemon.

- The retry queue is now limited by the audit_backlog_limit
  setting, the same as the other queues.  This allows admins
  to bound the size of all of the audit queues on the system.

- kauditd_rehold_skb() now returns records to the end of the
  hold queue to ensure ordering is preserved in the face of
  recent changes to kauditd_send_queue().

Cc: stable@vger.kernel.org
Fixes: 5b52330bbf ("audit: fix auditd/kernel connection state tracking")
Fixes: f4b3ee3c85 ("audit: improve robustness of the audit queue handling")
Reported-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Tested-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2022-01-25 13:22:51 -05:00
Amadeusz Sławiński
33569ef3c7 PM: hibernate: Remove register_nosave_region_late()
It is an unused wrapper forcing kmalloc allocation for registering
nosave regions. Also, rename __register_nosave_region() to
register_nosave_region() now that there is no need for disambiguation.

Signed-off-by: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Reviewed-by: Cezary Rojewski <cezary.rojewski@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-25 18:34:08 +01:00
Greg Kroah-Hartman
c9d967b2ce PM: wakeup: simplify the output logic of pm_show_wakelocks()
The buffer handling in pm_show_wakelocks() is tricky, and hopefully
correct.  Ensure it really is correct by using sysfs_emit_at() which
handles all of the tricky string handling logic in a PAGE_SIZE buffer
for us automatically as this is a sysfs file being read from.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-25 18:27:02 +01:00
Alexei Starovoitov
63ee956f69 bpf: Fix renaming task_getsecid_subj->current_getsecid_subj.
The commit 6326948f94 missed renaming of task->current LSM hook in BTF_ID.
Fix it to silence build warning:
WARN: resolve_btfids: unresolved symbol bpf_lsm_task_getsecid_subj

Fixes: 6326948f94 ("lsm: security_task_getsecid_subj() -> security_current_getsecid_subj()")
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-01-24 20:20:51 -08:00
Linus Torvalds
dd81e1c7d5 Merge tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:

 - A series of bpf fixes, including an oops fix and some codegen fixes.

 - Fix a regression in syscall_get_arch() for compat processes.

 - Fix boot failure on some 32-bit systems with KASAN enabled.

 - A couple of other build/minor fixes.

Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa,
Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin.

* tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Mask SRR0 before checking against the masked NIP
  powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64
  powerpc/32s: Fix kasan_init_region() for KASAN
  powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32
  powerpc/audit: Fix syscall_get_arch()
  powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
  tools/bpf: Rename 'struct event' to avoid naming conflict
  powerpc/bpf: Update ldimm64 instructions during extra pass
  powerpc32/bpf: Fix codegen for bpf-to-bpf calls
  bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
2022-01-23 17:52:42 +02:00