linux/kernel/locking
Mel Gorman 1c0908d8e4 rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.

 kernel BUG at fs/inode.c:625!
 Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
 CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1
 pc : clear_inode+0xa0/0xc0
 lr : clear_inode+0x38/0xc0
 Call trace:
  clear_inode+0xa0/0xc0
  evict+0x160/0x180
  iput+0x154/0x240
  do_unlinkat+0x184/0x300
  __arm64_sys_unlinkat+0x48/0xc0
  el0_svc_common.constprop.4+0xe4/0x2c0
  do_el0_svc+0xac/0x100
  el0_svc+0x78/0x200
  el0t_64_sync_handler+0x9c/0xc0
  el0t_64_sync+0x19c/0x1a0

It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.

Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.

Will Deacon observed:

  "I'd be more inclined to be suspicious of the slowpath tbh, as we need to
   make sure that we have acquire semantics on all paths where the lock can
   be taken. Looking at the rtmutex code, this really isn't obvious to me
   -- for example, try_to_take_rt_mutex() appears to be able to return via
   the 'takeit' label without acquire semantics and it looks like we might
   be relying on the caller's subsequent _unlock_ of the wait_lock for
   ordering, but that will give us release semantics which aren't correct."

Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.

The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.

Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.

It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.

[ bigeasy@linutronix.de: Initial prototype fix ]

Fixes: 700318d1d7 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
2022-12-12 19:55:56 +01:00
..
irqflag-debug.c lockdep: Noinstr annotate warn_bogus_irq_restore() 2021-02-10 14:44:39 +01:00
lock_events_list.h locking/rwsem: Remove reader optimistic spinning 2020-12-09 17:08:48 +01:00
lock_events.c locking/lock_events: Don't show pvqspinlock events on bare metal 2019-04-10 10:56:05 +02:00
lock_events.h locking/lock_events: Use raw_cpu_{add,inc}() for stats 2019-06-03 12:32:56 +02:00
lockdep_internals.h locking/lockdep: Iterate lock_classes directly when reading lockdep files 2022-02-16 15:57:58 +01:00
lockdep_proc.c locking/lockdep: Iterate lock_classes directly when reading lockdep files 2022-02-16 15:57:58 +01:00
lockdep_states.h
lockdep.c locking/lockdep: Print more debug information - report name and key when look_up_lock_class() got confused 2022-09-21 09:58:21 +02:00
locktorture.c locktorture,rcutorture,torture: Always log error message 2021-12-07 16:36:17 -08:00
Makefile kmsan: disable instrumentation of unsupported common kernel code 2022-10-03 14:03:20 -07:00
mcs_spinlock.h locking: Fix typos in comments 2021-03-22 02:45:52 +01:00
mutex-debug.c locking/ww_mutex: Gather mutex_waiter initialization 2021-08-17 19:04:41 +02:00
mutex.c locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning 2022-04-05 10:24:36 +02:00
mutex.h locking/mutex: Move the 'struct mutex_waiter' definition from <linux/mutex.h> to the internal header 2021-08-17 18:24:31 +02:00
osq_lock.c locking: Fix typos in comments 2021-03-22 02:45:52 +01:00
percpu-rwsem.c locking/percpu-rwsem: Add percpu_is_write_locked() and percpu_is_read_locked() 2022-08-30 10:56:23 +02:00
qrwlock.c locking: Add __lockfunc to slow path functions 2022-08-19 19:47:51 +02:00
qspinlock_paravirt.h locking: Add __lockfunc to slow path functions 2022-08-19 19:47:51 +02:00
qspinlock_stat.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
qspinlock.c locking: Add __lockfunc to slow path functions 2022-08-19 19:47:51 +02:00
rtmutex_api.c rtmutex: Add acquire semantics for rtmutex lock acquisition slow path 2022-12-12 19:55:56 +01:00
rtmutex_common.h locking/rtmutex: Dont dereference waiter lockless 2021-08-25 15:42:32 +02:00
rtmutex.c rtmutex: Add acquire semantics for rtmutex lock acquisition slow path 2022-12-12 19:55:56 +01:00
rwbase_rt.c locking: Apply contention tracepoints in the slow path 2022-04-05 10:24:35 +02:00
rwsem.c locking/rwsem: Disable preemption while trying for rwsem lock 2022-09-15 16:14:02 +02:00
semaphore.c locking: Add __sched to semaphore functions 2022-09-15 16:14:03 +02:00
spinlock_debug.c locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
spinlock_rt.c locking/rwlocks: introduce write_lock_nested 2022-01-22 08:33:37 +02:00
spinlock.c locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled 2022-08-04 11:05:43 +02:00
test-ww_mutex.c treewide: use prandom_u32_max() when possible, part 1 2022-10-11 17:42:55 -06:00
ww_mutex.h locking/ww_mutex: Add rt_mutex based lock type and accessors 2021-08-17 19:05:11 +02:00
ww_rt_mutex.c kernel/locking: Use a pointer in ww_mutex_trylock(). 2021-11-17 14:48:49 +01:00