linux/arch/x86/kernel
Xiaochen Shen b8511ccc75 x86/resctrl: Fix use-after-free when deleting resource groups
A resource group (rdtgrp) contains a reference count (rdtgrp->waitcount)
that indicates how many waiters expect this rdtgrp to exist. Waiters
could be waiting on rdtgroup_mutex or some work sitting on a task's
workqueue for when the task returns from kernel mode or exits.

The deletion of a rdtgrp is intended to have two phases:

  (1) while holding rdtgroup_mutex the necessary cleanup is done and
  rdtgrp->flags is set to RDT_DELETED,

  (2) after releasing the rdtgroup_mutex, the rdtgrp structure is freed
  only if there are no waiters and its flag is set to RDT_DELETED. Upon
  gaining access to rdtgroup_mutex or rdtgrp, a waiter is required to check
  for the RDT_DELETED flag.

When unmounting the resctrl file system or deleting ctrl_mon groups,
all of the subdirectories are removed and the data structure of rdtgrp
is forcibly freed without checking rdtgrp->waitcount. If at this point
there was a waiter on rdtgrp then a use-after-free issue occurs when the
waiter starts running and accesses the rdtgrp structure it was waiting
on.

See kfree() calls in [1], [2] and [3] in these two call paths in
following scenarios:
(1) rdt_kill_sb() -> rmdir_all_sub() -> free_all_child_rdtgrp()
(2) rdtgroup_rmdir() -> rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()

There are several scenarios that result in use-after-free issue in
following:

Scenario 1:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdt_kill_sb() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdt_kill_sb)
-------------------------------        ----------------------
rdtgroup_kn_lock_live
  atomic_inc(&rdtgrp->waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&rdtgrp->waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
                                       mutex_lock
                                       rmdir_all_sub
                                         /*
                                          * sentry and rdtgrp are freed
                                          * without checking refcount
                                          */
                                         free_all_child_rdtgrp
                                           kfree(sentry)*[1]
                                         kfree(rdtgrp)*[2]
                                       mutex_unlock
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [1] or [2].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    kfree(rdtgrp)

Scenario 2:
-----------
In Thread 1, rdtgroup_tasks_write() adds a task_work callback
move_myself(). If move_myself() is scheduled to execute after Thread 2
rdtgroup_rmdir() is finished, referring to earlier rdtgrp memory
(rdtgrp->waitcount) which was already freed in Thread 2 results in
use-after-free issue.

Thread 1 (rdtgroup_tasks_write)        Thread 2 (rdtgroup_rmdir)
-------------------------------        -------------------------
rdtgroup_kn_lock_live
  atomic_inc(&rdtgrp->waitcount)
  mutex_lock
rdtgroup_move_task
  __rdtgroup_move_task
    /*
     * Take an extra refcount, so rdtgrp cannot be freed
     * before the call back move_myself has been invoked
     */
    atomic_inc(&rdtgrp->waitcount)
    /* Callback move_myself will be scheduled for later */
    task_work_add(move_myself)
rdtgroup_kn_unlock
  mutex_unlock
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
                                       rdtgroup_kn_lock_live
                                         atomic_inc(&rdtgrp->waitcount)
                                         mutex_lock
                                       rdtgroup_rmdir_ctrl
                                         free_all_child_rdtgrp
                                           /*
                                            * sentry is freed without
                                            * checking refcount
                                            */
                                           kfree(sentry)*[3]
                                         rdtgroup_ctrl_remove
                                           rdtgrp->flags = RDT_DELETED
                                       rdtgroup_kn_unlock
                                         mutex_unlock
                                         atomic_dec_and_test(
                                                     &rdtgrp->waitcount)
                                         && (flags & RDT_DELETED)
                                           kfree(rdtgrp)
/*
 * Callback is scheduled to execute
 * after rdt_kill_sb is finished
 */
move_myself
  /*
   * Use-after-free: refer to earlier rdtgrp
   * memory which was freed in [3].
   */
  atomic_dec_and_test(&rdtgrp->waitcount)
  && (flags & RDT_DELETED)
    kfree(rdtgrp)

If CONFIG_DEBUG_SLAB=y, Slab corruption on kmalloc-2k can be observed
like following. Note that "0x6b" is POISON_FREE after kfree(). The
corrupted bits "0x6a", "0x64" at offset 0x424 correspond to
waitcount member of struct rdtgroup which was freed:

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c5b0d000, len=2048
  420: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkjkkkkkkkkkkk
  Single bit error detected. Probably bad RAM.
  Run memtest86+ or a similar memory test tool.
  Next obj: start=ffff9504c5b0d800, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

  Slab corruption (Not tainted): kmalloc-2k start=ffff9504c58ab800, len=2048
  420: 6b 6b 6b 6b 64 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkdkkkkkkkkkkk
  Prev obj: start=ffff9504c58ab000, len=2048
  000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk

Fix this by taking reference count (waitcount) of rdtgrp into account in
the two call paths that currently do not do so. Instead of always
freeing the resource group it will only be freed if there are no waiters
on it. If there are waiters, the resource group will have its flags set
to RDT_DELETED.

It will be left to the waiter to free the resource group when it starts
running and finding that it was the last waiter and the resource group
has been removed (rdtgrp->flags & RDT_DELETED) since. (1) rdt_kill_sb()
-> rmdir_all_sub() -> free_all_child_rdtgrp() (2) rdtgroup_rmdir() ->
rdtgroup_rmdir_ctrl() -> free_all_child_rdtgrp()

Fixes: f3cbeacaa0 ("x86/intel_rdt/cqm: Add rmdir support")
Fixes: 60cf5e101f ("x86/intel_rdt: Add mkdir to resctrl file system")
Suggested-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Xiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1578500886-21771-2-git-send-email-xiaochen.shen@intel.com
2020-01-20 16:45:43 +01:00
..
acpi x86/asm/32: Add ENDs to some functions and relabel with SYM_CODE_* 2019-10-18 11:58:33 +02:00
apic Merge branch 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 09:52:37 -08:00
cpu x86/resctrl: Fix use-after-free when deleting resource groups 2020-01-20 16:45:43 +01:00
fpu treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kprobes Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 15:04:47 -08:00
.gitignore
alternative.c x86/alternatives: Teach text_poke_bp() to emulate instructions 2019-11-15 14:07:01 -08:00
amd_gart_64.c dma-mapping updates for 5.5-rc1 2019-11-28 11:16:43 -08:00
amd_nb.c x86/amd_nb: Add PCI device IDs for family 17h, model 70h 2019-09-03 12:47:17 -07:00
apb_timer.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 441 2019-06-05 17:37:17 +02:00
aperture_64.c x86/gart: Exclude GART aperture from kcore 2019-03-23 12:11:49 +01:00
apm_32.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118 2019-05-24 17:39:02 +02:00
asm-offsets_32.c
asm-offsets_64.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-17 12:04:39 -07:00
asm-offsets.c x86/paravirt: Make read_cr2() CALLEE_SAVE 2019-07-17 23:17:37 +02:00
audit_64.c
bootflag.c
check.c x86/headers: Fix -Wmissing-prototypes warning 2018-11-23 07:59:59 +01:00
cpuid.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 142 2019-05-30 11:25:17 -07:00
crash_dump_32.c
crash_dump_64.c fs/core/vmcore: Move sev_active() reference to x86 arch code 2019-08-09 22:52:10 +10:00
crash.c x86/crash: Align function arguments on opening braces 2019-11-14 18:24:55 +01:00
devicetree.c x86/headers: Fix -Wmissing-prototypes warning 2018-11-23 07:59:59 +01:00
doublefault_32.c x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit 2019-11-26 22:00:04 +01:00
dumpstack_32.c x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area 2019-11-26 21:53:34 +01:00
dumpstack_64.c x86/dumpstack/64: Don't evaluate exception stacks before setup 2019-11-05 00:51:35 +01:00
dumpstack.c x86/dumpstack: Indicate PREEMPT_RT in dumps 2019-07-31 19:03:36 +02:00
e820.c ACPI updates for 5.5-rc1 2019-11-26 19:25:25 -08:00
early_printk.c efi/x86: Convert x86 EFI earlyprintk into generic earlycon implementation 2019-02-04 08:27:30 +01:00
early-quirks.c x86/intel: Disable HPET on Intel Ice Lake platforms 2019-11-29 12:17:58 +01:00
ebda.c
eisa.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 243 2019-06-19 17:09:07 +02:00
espfix_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
ftrace_32.S x86/ftrace: Get rid of function_hook 2019-10-25 10:52:22 +02:00
ftrace_64.S New tracing features: 2019-11-27 11:42:01 -08:00
ftrace.c ftrace: Fix function_graph tracer interaction with BPF trampoline 2019-12-10 13:53:59 -05:00
head32.c x86/boot: Mostly revert commit ae7e1238e6 ("Add ACPI RSDP address to setup_header") 2018-11-20 09:43:10 +01:00
head64.c x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area 2019-10-11 18:38:15 +02:00
head_32.S Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 10:42:40 -08:00
head_64.S x86/asm/64: Change all ENTRY+END to SYM_CODE_* 2019-10-18 11:58:26 +02:00
hpet.c x86/hpet: Undo the early counter is counting check 2019-07-25 12:21:32 +02:00
hw_breakpoint.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
i8237.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
i8253.c x86/timer: Skip PIT initialization on modern chipsets 2019-06-29 11:35:35 +02:00
i8259.c
idt.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 11:22:57 -07:00
ima_arch.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
io_delay.c x86/io_delay: Define IO_DELAY macros in C instead of Kconfig 2019-05-24 08:46:06 +02:00
ioport.c x86/ioperm: Extend IOPL config to control ioperm() as well 2019-11-16 11:24:06 +01:00
irq_32.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_64.c x86/irq: Move IS_ERR_OR_NULL() check into common do_IRQ() code 2019-08-19 23:19:06 +02:00
irq_work.c
irq.c x86/irq: Check for VECTOR_UNUSED directly 2019-08-19 23:19:07 +02:00
irqflags.S x86/asm: Change all ENTRY+ENDPROC to SYM_FUNC_* 2019-10-18 11:58:33 +02:00
irqinit.c x86/irq/32: Handle irq stack allocation failure proper 2019-04-17 15:31:42 +02:00
itmt.c proc/sysctl: add shared variables for range check 2019-07-18 17:08:07 -07:00
jailhouse.c x86/jailhouse: Only enable platform UARTs if available 2019-10-10 15:43:59 +02:00
jump_label.c x86/alternatives: Teach text_poke_bp() to emulate instructions 2019-11-15 14:07:01 -08:00
kdebugfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kexec-bzimage64.c Merge branch 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2019-09-28 08:14:15 -07:00
kgdb.c x86/apic: Provide and use helper for send_IPI_allbutself() 2019-07-25 16:12:00 +02:00
ksysfs.c x86/boot: Introduce setup_indirect 2019-11-12 16:21:15 +01:00
kvm.c x86/kvm: Fix -Wmissing-prototypes warnings 2019-10-25 14:01:14 +02:00
kvmclock.c x86: kvmguest: use TSC clocksource if invariant TSC is exposed 2019-02-20 22:48:52 +01:00
ldt.c x86: Convert some slow-path static_cpu_has() callers to boot_cpu_has() 2019-04-08 12:13:34 +02:00
livepatch.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 13 2019-05-21 11:28:45 +02:00
machine_kexec_32.c x86/mm: Remove set_pages_x() and set_pages_nx() 2019-09-03 09:26:37 +02:00
machine_kexec_64.c x86/kdump: Remove the backup region handling 2019-11-14 18:24:43 +01:00
Makefile x86/doublefault/32: Rename doublefault.c to doublefault_32.c 2019-11-26 21:53:34 +01:00
mmconf-fam10h_64.c
module.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 156 2019-05-30 11:26:35 -07:00
mpparse.c x86/boot: Fix memory leak in default_get_smp_config() 2019-07-16 23:13:48 +02:00
msr.c x86/msr: Restrict MSR access when the kernel is locked down 2019-08-19 21:54:16 -07:00
nmi_selftest.c
nmi.c x86/hotplug: Silence APIC and NMI when CPU is dead 2019-07-25 16:11:59 +02:00
paravirt_patch.c x86/paravirt: Standardize 'insn_buff' variable names 2019-04-29 16:05:49 +02:00
paravirt-spinlocks.c
paravirt.c x86/iopl: Remove legacy IOPL option 2019-11-16 11:24:05 +01:00
pci-dma.c dma-mapping updates for 5.5-rc1 2019-11-28 11:16:43 -08:00
pci-iommu_table.c
pci-swiotlb.c dma-mapping: fix filename references 2019-09-03 08:36:30 +02:00
pcspeaker.c
perf_regs.c perf/x86/regs: Check reserved bits 2019-06-24 19:19:24 +02:00
platform-quirks.c
pmem.c
probe_roms.c
process_32.c x86/iopl: Remove legacy IOPL option 2019-11-16 11:24:05 +01:00
process_64.c x86/iopl: Remove legacy IOPL option 2019-11-16 11:24:05 +01:00
process.c x86/ioperm: Save an indentation level in tss_update_io_bitmap() 2019-11-30 18:06:56 +01:00
process.h x86: Use the correct SPDX License Identifier in headers 2019-10-01 20:31:35 +02:00
ptrace.c x86/ptrace: Document FSBASE and GSBASE ABI oddities 2019-11-26 22:00:12 +01:00
pvclock.c x86/vdso: Switch to generic vDSO implementation 2019-06-22 21:21:10 +02:00
quirks.c x86/PCI: Remove superfluous returns from void functions 2019-08-20 09:54:36 +02:00
reboot_fixups_32.c
reboot.c x86/apic: Provide and use helper for send_IPI_allbutself() 2019-07-25 16:12:00 +02:00
relocate_kernel_32.S x86/asm: Annotate relocate_kernel_{32,64}.c 2019-10-18 09:53:19 +02:00
relocate_kernel_64.S x86/asm: Annotate relocate_kernel_{32,64}.c 2019-10-18 09:53:19 +02:00
resource.c
rtc.c
setup_percpu.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
setup.c ACPI updates for 5.5-rc1 2019-11-26 19:25:25 -08:00
signal_compat.c
signal.c x86: use static_cpu_has in uaccess region to avoid instrumentation 2019-07-12 11:05:42 -07:00
smp.c x86/smp: Move smp_function_call implementations into IPI code 2019-07-25 16:12:01 +02:00
smpboot.c Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-09-17 12:04:39 -07:00
stacktrace.c x86/stacktrace: Prevent access_ok() warnings in arch_stack_walk_user() 2019-07-22 10:42:36 +02:00
step.c
sys_x86_64.c
sysfb_efi.c x86/sysfb_efi: Add quirks for some devices with swapped width and height 2019-07-22 10:47:11 +02:00
sysfb_simplefb.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
sysfb.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 2019-05-30 11:26:32 -07:00
tboot.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
time.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 16:59:34 -07:00
tls.c x86/tls: Fix possible spectre-v1 in do_get_thread_area() 2019-06-27 23:48:04 +02:00
tls.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 193 2019-05-30 11:29:21 -07:00
topology.c x86/topology: Make DEBUG_HOTPLUG_CPU0 pr_info() more descriptive 2019-04-19 19:42:57 +02:00
trace_clock.c
tracepoint.c x86/kernel: Fix more -Wmissing-prototypes warnings 2018-12-08 12:24:35 +01:00
traps.c x86/traps: die() instead of panicking on a double fault 2019-11-26 22:00:12 +01:00
tsc_msr.c x86/cpu: Update init data for new Airmont CPU model 2019-09-06 07:30:40 +02:00
tsc_sync.c x86: Use pr_warn instead of pr_warning 2019-10-18 15:00:18 +02:00
tsc.c x86/tsc: Respect tsc command line paraemeter for clocksource_tsc_early 2019-11-05 01:24:56 +01:00
umip.c Merge branches 'x86-cpu-for-linus' and 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-11-26 08:58:08 -08:00
unwind_frame.c x86/stackframe/32: Provide consistent pt_regs 2019-06-25 10:23:47 +02:00
unwind_guess.c treewide: Add SPDX license identifier for missed files 2019-05-21 10:50:45 +02:00
unwind_orc.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-07-08 16:59:34 -07:00
uprobes.c x86/apic, x86/uprobes: Correct parameter names in kernel-doc comments 2019-10-27 09:00:28 +01:00
verify_cpu.S x86/asm: Annotate local pseudo-functions 2019-10-18 10:04:04 +02:00
vm86_32.c signal: Remove task parameter from force_sig 2019-05-27 09:36:28 -05:00
vmlinux.lds.S x86/vmlinux: Use INT3 instead of NOP for linker fill bytes 2019-11-04 19:10:08 +01:00
vsmp_64.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 346 2019-06-05 17:37:08 +02:00
x86_init.c x86/init: Allow DT configured systems to disable RTC at boot time 2019-11-12 15:46:53 +01:00