Documentation: locking: Describe seqlock design and usage
Proper documentation for the design and usage of sequence counters and sequential locks does not exist. Complete the seqlock.h documentation as follows: - Divide all documentation on a seqcount_t vs. seqlock_t basis. The description for both mechanisms was intermingled, which is incorrect since the usage constrains for each type are vastly different. - Add an introductory paragraph describing the internal design of, and rationale for, sequence counters. - Document seqcount_t writer non-preemptibility requirement, which was not previously documented anywhere, and provide a clear rationale. - Provide template code for seqcount_t and seqlock_t initialization and reader/writer critical sections. - Recommend using seqlock_t by default. It implicitly handles the serialization and non-preemptibility requirements of writers. At seqlock.h: - Remove references to brlocks as they've long been removed from the kernel. - Remove references to gcc-3.x since the kernel's minimum supported gcc version is 4.9. References:0f6ed63b17
("no need to keep brlock macros anymore...") References:6ec4476ac8
("Raise gcc version requirement to 4.9") Signed-off-by: Ahmed S. Darwish <a.darwish@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200720155530.1173732-2-a.darwish@linutronix.de
This commit is contained in:
parent
f05d67179d
commit
0d24f65e93
@ -14,6 +14,7 @@ locking
|
|||||||
mutex-design
|
mutex-design
|
||||||
rt-mutex-design
|
rt-mutex-design
|
||||||
rt-mutex
|
rt-mutex
|
||||||
|
seqlock
|
||||||
spinlocks
|
spinlocks
|
||||||
ww-mutex-design
|
ww-mutex-design
|
||||||
preempt-locking
|
preempt-locking
|
||||||
|
170
Documentation/locking/seqlock.rst
Normal file
170
Documentation/locking/seqlock.rst
Normal file
@ -0,0 +1,170 @@
|
|||||||
|
======================================
|
||||||
|
Sequence counters and sequential locks
|
||||||
|
======================================
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
|
Sequence counters are a reader-writer consistency mechanism with
|
||||||
|
lockless readers (read-only retry loops), and no writer starvation. They
|
||||||
|
are used for data that's rarely written to (e.g. system time), where the
|
||||||
|
reader wants a consistent set of information and is willing to retry if
|
||||||
|
that information changes.
|
||||||
|
|
||||||
|
A data set is consistent when the sequence count at the beginning of the
|
||||||
|
read side critical section is even and the same sequence count value is
|
||||||
|
read again at the end of the critical section. The data in the set must
|
||||||
|
be copied out inside the read side critical section. If the sequence
|
||||||
|
count has changed between the start and the end of the critical section,
|
||||||
|
the reader must retry.
|
||||||
|
|
||||||
|
Writers increment the sequence count at the start and the end of their
|
||||||
|
critical section. After starting the critical section the sequence count
|
||||||
|
is odd and indicates to the readers that an update is in progress. At
|
||||||
|
the end of the write side critical section the sequence count becomes
|
||||||
|
even again which lets readers make progress.
|
||||||
|
|
||||||
|
A sequence counter write side critical section must never be preempted
|
||||||
|
or interrupted by read side sections. Otherwise the reader will spin for
|
||||||
|
the entire scheduler tick due to the odd sequence count value and the
|
||||||
|
interrupted writer. If that reader belongs to a real-time scheduling
|
||||||
|
class, it can spin forever and the kernel will livelock.
|
||||||
|
|
||||||
|
This mechanism cannot be used if the protected data contains pointers,
|
||||||
|
as the writer can invalidate a pointer that the reader is following.
|
||||||
|
|
||||||
|
|
||||||
|
.. _seqcount_t:
|
||||||
|
|
||||||
|
Sequence counters (``seqcount_t``)
|
||||||
|
==================================
|
||||||
|
|
||||||
|
This is the the raw counting mechanism, which does not protect against
|
||||||
|
multiple writers. Write side critical sections must thus be serialized
|
||||||
|
by an external lock.
|
||||||
|
|
||||||
|
If the write serialization primitive is not implicitly disabling
|
||||||
|
preemption, preemption must be explicitly disabled before entering the
|
||||||
|
write side section. If the read section can be invoked from hardirq or
|
||||||
|
softirq contexts, interrupts or bottom halves must also be respectively
|
||||||
|
disabled before entering the write section.
|
||||||
|
|
||||||
|
If it's desired to automatically handle the sequence counter
|
||||||
|
requirements of writer serialization and non-preemptibility, use
|
||||||
|
:ref:`seqlock_t` instead.
|
||||||
|
|
||||||
|
Initialization::
|
||||||
|
|
||||||
|
/* dynamic */
|
||||||
|
seqcount_t foo_seqcount;
|
||||||
|
seqcount_init(&foo_seqcount);
|
||||||
|
|
||||||
|
/* static */
|
||||||
|
static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
|
||||||
|
|
||||||
|
/* C99 struct init */
|
||||||
|
struct {
|
||||||
|
.seq = SEQCNT_ZERO(foo.seq),
|
||||||
|
} foo;
|
||||||
|
|
||||||
|
Write path::
|
||||||
|
|
||||||
|
/* Serialized context with disabled preemption */
|
||||||
|
|
||||||
|
write_seqcount_begin(&foo_seqcount);
|
||||||
|
|
||||||
|
/* ... [[write-side critical section]] ... */
|
||||||
|
|
||||||
|
write_seqcount_end(&foo_seqcount);
|
||||||
|
|
||||||
|
Read path::
|
||||||
|
|
||||||
|
do {
|
||||||
|
seq = read_seqcount_begin(&foo_seqcount);
|
||||||
|
|
||||||
|
/* ... [[read-side critical section]] ... */
|
||||||
|
|
||||||
|
} while (read_seqcount_retry(&foo_seqcount, seq));
|
||||||
|
|
||||||
|
|
||||||
|
.. _seqlock_t:
|
||||||
|
|
||||||
|
Sequential locks (``seqlock_t``)
|
||||||
|
================================
|
||||||
|
|
||||||
|
This contains the :ref:`seqcount_t` mechanism earlier discussed, plus an
|
||||||
|
embedded spinlock for writer serialization and non-preemptibility.
|
||||||
|
|
||||||
|
If the read side section can be invoked from hardirq or softirq context,
|
||||||
|
use the write side function variants which disable interrupts or bottom
|
||||||
|
halves respectively.
|
||||||
|
|
||||||
|
Initialization::
|
||||||
|
|
||||||
|
/* dynamic */
|
||||||
|
seqlock_t foo_seqlock;
|
||||||
|
seqlock_init(&foo_seqlock);
|
||||||
|
|
||||||
|
/* static */
|
||||||
|
static DEFINE_SEQLOCK(foo_seqlock);
|
||||||
|
|
||||||
|
/* C99 struct init */
|
||||||
|
struct {
|
||||||
|
.seql = __SEQLOCK_UNLOCKED(foo.seql)
|
||||||
|
} foo;
|
||||||
|
|
||||||
|
Write path::
|
||||||
|
|
||||||
|
write_seqlock(&foo_seqlock);
|
||||||
|
|
||||||
|
/* ... [[write-side critical section]] ... */
|
||||||
|
|
||||||
|
write_sequnlock(&foo_seqlock);
|
||||||
|
|
||||||
|
Read path, three categories:
|
||||||
|
|
||||||
|
1. Normal Sequence readers which never block a writer but they must
|
||||||
|
retry if a writer is in progress by detecting change in the sequence
|
||||||
|
number. Writers do not wait for a sequence reader::
|
||||||
|
|
||||||
|
do {
|
||||||
|
seq = read_seqbegin(&foo_seqlock);
|
||||||
|
|
||||||
|
/* ... [[read-side critical section]] ... */
|
||||||
|
|
||||||
|
} while (read_seqretry(&foo_seqlock, seq));
|
||||||
|
|
||||||
|
2. Locking readers which will wait if a writer or another locking reader
|
||||||
|
is in progress. A locking reader in progress will also block a writer
|
||||||
|
from entering its critical section. This read lock is
|
||||||
|
exclusive. Unlike rwlock_t, only one locking reader can acquire it::
|
||||||
|
|
||||||
|
read_seqlock_excl(&foo_seqlock);
|
||||||
|
|
||||||
|
/* ... [[read-side critical section]] ... */
|
||||||
|
|
||||||
|
read_sequnlock_excl(&foo_seqlock);
|
||||||
|
|
||||||
|
3. Conditional lockless reader (as in 1), or locking reader (as in 2),
|
||||||
|
according to a passed marker. This is used to avoid lockless readers
|
||||||
|
starvation (too much retry loops) in case of a sharp spike in write
|
||||||
|
activity. First, a lockless read is tried (even marker passed). If
|
||||||
|
that trial fails (odd sequence counter is returned, which is used as
|
||||||
|
the next iteration marker), the lockless read is transformed to a
|
||||||
|
full locking read and no retry loop is necessary::
|
||||||
|
|
||||||
|
/* marker; even initialization */
|
||||||
|
int seq = 0;
|
||||||
|
do {
|
||||||
|
read_seqbegin_or_lock(&foo_seqlock, &seq);
|
||||||
|
|
||||||
|
/* ... [[read-side critical section]] ... */
|
||||||
|
|
||||||
|
} while (need_seqretry(&foo_seqlock, seq));
|
||||||
|
done_seqretry(&foo_seqlock, seq);
|
||||||
|
|
||||||
|
|
||||||
|
API documentation
|
||||||
|
=================
|
||||||
|
|
||||||
|
.. kernel-doc:: include/linux/seqlock.h
|
@ -1,36 +1,15 @@
|
|||||||
/* SPDX-License-Identifier: GPL-2.0 */
|
/* SPDX-License-Identifier: GPL-2.0 */
|
||||||
#ifndef __LINUX_SEQLOCK_H
|
#ifndef __LINUX_SEQLOCK_H
|
||||||
#define __LINUX_SEQLOCK_H
|
#define __LINUX_SEQLOCK_H
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Reader/writer consistent mechanism without starving writers. This type of
|
* seqcount_t / seqlock_t - a reader-writer consistency mechanism with
|
||||||
* lock for data where the reader wants a consistent set of information
|
* lockless readers (read-only retry loops), and no writer starvation.
|
||||||
* and is willing to retry if the information changes. There are two types
|
|
||||||
* of readers:
|
|
||||||
* 1. Sequence readers which never block a writer but they may have to retry
|
|
||||||
* if a writer is in progress by detecting change in sequence number.
|
|
||||||
* Writers do not wait for a sequence reader.
|
|
||||||
* 2. Locking readers which will wait if a writer or another locking reader
|
|
||||||
* is in progress. A locking reader in progress will also block a writer
|
|
||||||
* from going forward. Unlike the regular rwlock, the read lock here is
|
|
||||||
* exclusive so that only one locking reader can get it.
|
|
||||||
*
|
*
|
||||||
* This is not as cache friendly as brlock. Also, this may not work well
|
* See Documentation/locking/seqlock.rst
|
||||||
* for data that contains pointers, because any writer could
|
|
||||||
* invalidate a pointer that a reader was following.
|
|
||||||
*
|
*
|
||||||
* Expected non-blocking reader usage:
|
* Copyrights:
|
||||||
* do {
|
* - Based on x86_64 vsyscall gettimeofday: Keith Owens, Andrea Arcangeli
|
||||||
* seq = read_seqbegin(&foo);
|
|
||||||
* ...
|
|
||||||
* } while (read_seqretry(&foo, seq));
|
|
||||||
*
|
|
||||||
*
|
|
||||||
* On non-SMP the spin locks disappear but the writer still needs
|
|
||||||
* to increment the sequence variables because an interrupt routine could
|
|
||||||
* change the state of the data.
|
|
||||||
*
|
|
||||||
* Based on x86_64 vsyscall gettimeofday
|
|
||||||
* by Keith Owens and Andrea Arcangeli
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#include <linux/spinlock.h>
|
#include <linux/spinlock.h>
|
||||||
@ -41,8 +20,8 @@
|
|||||||
#include <asm/processor.h>
|
#include <asm/processor.h>
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* The seqlock interface does not prescribe a precise sequence of read
|
* The seqlock seqcount_t interface does not prescribe a precise sequence of
|
||||||
* begin/retry/end. For readers, typically there is a call to
|
* read begin/retry/end. For readers, typically there is a call to
|
||||||
* read_seqcount_begin() and read_seqcount_retry(), however, there are more
|
* read_seqcount_begin() and read_seqcount_retry(), however, there are more
|
||||||
* esoteric cases which do not follow this pattern.
|
* esoteric cases which do not follow this pattern.
|
||||||
*
|
*
|
||||||
@ -50,16 +29,30 @@
|
|||||||
* via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
|
* via seqcount_t under KCSAN: upon beginning a seq-reader critical section,
|
||||||
* pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
|
* pessimistically mark the next KCSAN_SEQLOCK_REGION_MAX memory accesses as
|
||||||
* atomics; if there is a matching read_seqcount_retry() call, no following
|
* atomics; if there is a matching read_seqcount_retry() call, no following
|
||||||
* memory operations are considered atomic. Usage of seqlocks via seqlock_t
|
* memory operations are considered atomic. Usage of the seqlock_t interface
|
||||||
* interface is not affected.
|
* is not affected.
|
||||||
*/
|
*/
|
||||||
#define KCSAN_SEQLOCK_REGION_MAX 1000
|
#define KCSAN_SEQLOCK_REGION_MAX 1000
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* Version using sequence counter only.
|
* Sequence counters (seqcount_t)
|
||||||
* This can be used when code has its own mutex protecting the
|
*
|
||||||
* updating starting before the write_seqcountbeqin() and ending
|
* This is the raw counting mechanism, without any writer protection.
|
||||||
* after the write_seqcount_end().
|
*
|
||||||
|
* Write side critical sections must be serialized and non-preemptible.
|
||||||
|
*
|
||||||
|
* If readers can be invoked from hardirq or softirq contexts,
|
||||||
|
* interrupts or bottom halves must also be respectively disabled before
|
||||||
|
* entering the write section.
|
||||||
|
*
|
||||||
|
* This mechanism can't be used if the protected data contains pointers,
|
||||||
|
* as the writer can invalidate a pointer that a reader is following.
|
||||||
|
*
|
||||||
|
* If it's desired to automatically handle the sequence counter writer
|
||||||
|
* serialization and non-preemptibility requirements, use a sequential
|
||||||
|
* lock (seqlock_t) instead.
|
||||||
|
*
|
||||||
|
* See Documentation/locking/seqlock.rst
|
||||||
*/
|
*/
|
||||||
typedef struct seqcount {
|
typedef struct seqcount {
|
||||||
unsigned sequence;
|
unsigned sequence;
|
||||||
@ -398,10 +391,6 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
|
|||||||
smp_wmb(); /* increment "sequence" before following stores */
|
smp_wmb(); /* increment "sequence" before following stores */
|
||||||
}
|
}
|
||||||
|
|
||||||
/*
|
|
||||||
* Sequence counter only version assumes that callers are using their
|
|
||||||
* own mutexing.
|
|
||||||
*/
|
|
||||||
static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
|
static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
|
||||||
{
|
{
|
||||||
raw_write_seqcount_begin(s);
|
raw_write_seqcount_begin(s);
|
||||||
@ -434,15 +423,21 @@ static inline void write_seqcount_invalidate(seqcount_t *s)
|
|||||||
kcsan_nestable_atomic_end();
|
kcsan_nestable_atomic_end();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Sequential locks (seqlock_t)
|
||||||
|
*
|
||||||
|
* Sequence counters with an embedded spinlock for writer serialization
|
||||||
|
* and non-preemptibility.
|
||||||
|
*
|
||||||
|
* For more info, see:
|
||||||
|
* - Comments on top of seqcount_t
|
||||||
|
* - Documentation/locking/seqlock.rst
|
||||||
|
*/
|
||||||
typedef struct {
|
typedef struct {
|
||||||
struct seqcount seqcount;
|
struct seqcount seqcount;
|
||||||
spinlock_t lock;
|
spinlock_t lock;
|
||||||
} seqlock_t;
|
} seqlock_t;
|
||||||
|
|
||||||
/*
|
|
||||||
* These macros triggered gcc-3.x compile-time problems. We think these are
|
|
||||||
* OK now. Be cautious.
|
|
||||||
*/
|
|
||||||
#define __SEQLOCK_UNLOCKED(lockname) \
|
#define __SEQLOCK_UNLOCKED(lockname) \
|
||||||
{ \
|
{ \
|
||||||
.seqcount = SEQCNT_ZERO(lockname), \
|
.seqcount = SEQCNT_ZERO(lockname), \
|
||||||
|
Loading…
Reference in New Issue
Block a user