linux/kernel/locking
Wander Lairson Costa db370a8b9f rtmutex: Ensure that the top waiter is always woken up
Let L1 and L2 be two spinlocks.

Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
waiter of L2.

Let T2 be the task holding L2.

Let T3 be a task trying to acquire L1.

The following events will lead to a state in which the wait queue of L2
isn't empty, but no task actually holds the lock.

T1                T2                                  T3
==                ==                                  ==

                                                      spin_lock(L1)
                                                      | raw_spin_lock(L1->wait_lock)
                                                      | rtlock_slowlock_locked(L1)
                                                      | | task_blocks_on_rt_mutex(L1, T3)
                                                      | | | orig_waiter->lock = L1
                                                      | | | orig_waiter->task = T3
                                                      | | | raw_spin_unlock(L1->wait_lock)
                                                      | | | rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)
                  spin_unlock(L2)                     | | | |
                  | rt_mutex_slowunlock(L2)           | | | |
                  | | raw_spin_lock(L2->wait_lock)    | | | |
                  | | wakeup(T1)                      | | | |
                  | | raw_spin_unlock(L2->wait_lock)  | | | |
                                                      | | | | waiter = T1->pi_blocked_on
                                                      | | | | waiter == rt_mutex_top_waiter(L2)
                                                      | | | | waiter->task == T1
                                                      | | | | raw_spin_lock(L2->wait_lock)
                                                      | | | | dequeue(L2, waiter)
                                                      | | | | update_prio(waiter, T1)
                                                      | | | | enqueue(L2, waiter)
                                                      | | | | waiter != rt_mutex_top_waiter(L2)
                                                      | | | | L2->owner == NULL
                                                      | | | | wakeup(T1)
                                                      | | | | raw_spin_unlock(L2->wait_lock)
T1 wakes up
T1 != top_waiter(L2)
schedule_rtlock()

If the deadline of T1 is updated before the call to update_prio(), and the
new deadline is greater than the deadline of the second top waiter, then
after the requeue, T1 is no longer the top waiter, and the wrong task is
woken up which will then go back to sleep because it is not the top waiter.

This can be reproduced in PREEMPT_RT with stress-ng:

while true; do
    stress-ng --sched deadline --sched-period 1000000000 \
    	    --sched-runtime 800000000 --sched-deadline \
    	    1000000000 --mmapfork 23 -t 20
done

A similar issue was pointed out by Thomas versus the cases where the top
waiter drops out early due to a signal or timeout, which is a general issue
for all regular rtmutex use cases, e.g. futex.

The problematic code is in rt_mutex_adjust_prio_chain():

    	// Save the top waiter before dequeue/enqueue
	prerequeue_top_waiter = rt_mutex_top_waiter(lock);

	rt_mutex_dequeue(lock, waiter);
	waiter_update_prio(waiter, task);
	rt_mutex_enqueue(lock, waiter);

	// Lock has no owner?
	if (!rt_mutex_owner(lock)) {
	   	// Top waiter changed		      			   
  ---->		if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))
  ---->			wake_up_state(waiter->task, waiter->wake_state);

This only takes the case into account where @waiter is the new top waiter
due to the requeue operation.

But it fails to handle the case where @waiter is not longer the top
waiter due to the requeue operation.

Ensure that the new top waiter is woken up so in all cases so it can take
over the ownerless lock.

[ tglx: Amend changelog, add Fixes tag ]

Fixes: c014ef69b3 ("locking/rtmutex: Add wake_state to rt_mutex_waiter")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230117172649.52465-1-wander@redhat.com
Link: https://lore.kernel.org/r/20230202123020.14844-1-wander@redhat.com
2023-02-06 14:49:13 +01:00
..
irqflag-debug.c lockdep: Noinstr annotate warn_bogus_irq_restore() 2021-02-10 14:44:39 +01:00
lock_events_list.h locking/rwsem: Remove reader optimistic spinning 2020-12-09 17:08:48 +01:00
lock_events.c locking/lock_events: Don't show pvqspinlock events on bare metal 2019-04-10 10:56:05 +02:00
lock_events.h locking/lock_events: Use raw_cpu_{add,inc}() for stats 2019-06-03 12:32:56 +02:00
lockdep_internals.h locking/lockdep: Iterate lock_classes directly when reading lockdep files 2022-02-16 15:57:58 +01:00
lockdep_proc.c locking/lockdep: Iterate lock_classes directly when reading lockdep files 2022-02-16 15:57:58 +01:00
lockdep_states.h locking/lockdep: Rework FS_RECLAIM annotation 2017-08-10 12:29:03 +02:00
lockdep.c locking/lockdep: Print more debug information - report name and key when look_up_lock_class() got confused 2022-09-21 09:58:21 +02:00
locktorture.c locktorture,rcutorture,torture: Always log error message 2021-12-07 16:36:17 -08:00
Makefile lockdep: allow instrumenting lockdep.c with KMSAN 2022-12-11 18:12:11 -08:00
mcs_spinlock.h locking: Fix typos in comments 2021-03-22 02:45:52 +01:00
mutex-debug.c locking/ww_mutex: Gather mutex_waiter initialization 2021-08-17 19:04:41 +02:00
mutex.c locking/mutex: Make contention tracepoints more consistent wrt adaptive spinning 2022-04-05 10:24:36 +02:00
mutex.h locking/mutex: Move the 'struct mutex_waiter' definition from <linux/mutex.h> to the internal header 2021-08-17 18:24:31 +02:00
osq_lock.c locking: Fix typos in comments 2021-03-22 02:45:52 +01:00
percpu-rwsem.c locking/percpu-rwsem: Add percpu_is_write_locked() and percpu_is_read_locked() 2022-08-30 10:56:23 +02:00
qrwlock.c locking: Add __lockfunc to slow path functions 2022-08-19 19:47:51 +02:00
qspinlock_paravirt.h locking: Add __lockfunc to slow path functions 2022-08-19 19:47:51 +02:00
qspinlock_stat.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 157 2019-05-30 11:26:37 -07:00
qspinlock.c locking: Add __lockfunc to slow path functions 2022-08-19 19:47:51 +02:00
rtmutex_api.c rtmutex: Add acquire semantics for rtmutex lock acquisition slow path 2022-12-12 19:55:56 +01:00
rtmutex_common.h locking/rtmutex: Dont dereference waiter lockless 2021-08-25 15:42:32 +02:00
rtmutex.c rtmutex: Ensure that the top waiter is always woken up 2023-02-06 14:49:13 +01:00
rwbase_rt.c locking: Apply contention tracepoints in the slow path 2022-04-05 10:24:35 +02:00
rwsem.c locking/rwsem: Disable preemption while trying for rwsem lock 2022-09-15 16:14:02 +02:00
semaphore.c locking: Add __sched to semaphore functions 2022-09-15 16:14:03 +02:00
spinlock_debug.c locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
spinlock_rt.c locking/rwlocks: introduce write_lock_nested 2022-01-22 08:33:37 +02:00
spinlock.c locking/spinlocks: Mark spinlocks noinline when inline spinlocks are disabled 2022-08-04 11:05:43 +02:00
test-ww_mutex.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
ww_mutex.h locking/ww_mutex: Add rt_mutex based lock type and accessors 2021-08-17 19:05:11 +02:00
ww_rt_mutex.c kernel/locking: Use a pointer in ww_mutex_trylock(). 2021-11-17 14:48:49 +01:00