doc: Set down RCU's scheduling-clock-interrupt needs
This commit documents the situations in which RCU needs the scheduling-clock interrupt to be enabled, along with the consequences of failing to meet RCU's needs in this area. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit is contained in:
parent
8a597d636f
commit
850bf6d592
@ -2080,6 +2080,8 @@ Some of the relevant points of interest are as follows:
|
|||||||
<li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
|
<li> <a href="#Scheduler and RCU">Scheduler and RCU</a>.
|
||||||
<li> <a href="#Tracing and RCU">Tracing and RCU</a>.
|
<li> <a href="#Tracing and RCU">Tracing and RCU</a>.
|
||||||
<li> <a href="#Energy Efficiency">Energy Efficiency</a>.
|
<li> <a href="#Energy Efficiency">Energy Efficiency</a>.
|
||||||
|
<li> <a href="#Scheduling-Clock Interrupts and RCU">
|
||||||
|
Scheduling-Clock Interrupts and RCU</a>.
|
||||||
<li> <a href="#Memory Efficiency">Memory Efficiency</a>.
|
<li> <a href="#Memory Efficiency">Memory Efficiency</a>.
|
||||||
<li> <a href="#Performance, Scalability, Response Time, and Reliability">
|
<li> <a href="#Performance, Scalability, Response Time, and Reliability">
|
||||||
Performance, Scalability, Response Time, and Reliability</a>.
|
Performance, Scalability, Response Time, and Reliability</a>.
|
||||||
@ -2532,6 +2534,134 @@ I learned of many of these requirements via angry phone calls:
|
|||||||
Flaming me on the Linux-kernel mailing list was apparently not
|
Flaming me on the Linux-kernel mailing list was apparently not
|
||||||
sufficient to fully vent their ire at RCU's energy-efficiency bugs!
|
sufficient to fully vent their ire at RCU's energy-efficiency bugs!
|
||||||
|
|
||||||
|
<h3><a name="Scheduling-Clock Interrupts and RCU">
|
||||||
|
Scheduling-Clock Interrupts and RCU</a></h3>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
The kernel transitions between in-kernel non-idle execution, userspace
|
||||||
|
execution, and the idle loop.
|
||||||
|
Depending on kernel configuration, RCU handles these states differently:
|
||||||
|
|
||||||
|
<table border=3>
|
||||||
|
<tr><th><tt>HZ</tt> Kconfig</th>
|
||||||
|
<th>In-Kernel</th>
|
||||||
|
<th>Usermode</th>
|
||||||
|
<th>Idle</th></tr>
|
||||||
|
<tr><th align="left"><tt>HZ_PERIODIC</tt></th>
|
||||||
|
<td>Can rely on scheduling-clock interrupt.</td>
|
||||||
|
<td>Can rely on scheduling-clock interrupt and its
|
||||||
|
detection of interrupt from usermode.</td>
|
||||||
|
<td>Can rely on RCU's dyntick-idle detection.</td></tr>
|
||||||
|
<tr><th align="left"><tt>NO_HZ_IDLE</tt></th>
|
||||||
|
<td>Can rely on scheduling-clock interrupt.</td>
|
||||||
|
<td>Can rely on scheduling-clock interrupt and its
|
||||||
|
detection of interrupt from usermode.</td>
|
||||||
|
<td>Can rely on RCU's dyntick-idle detection.</td></tr>
|
||||||
|
<tr><th align="left"><tt>NO_HZ_FULL</tt></th>
|
||||||
|
<td>Can only sometimes rely on scheduling-clock interrupt.
|
||||||
|
In other cases, it is necessary to bound kernel execution
|
||||||
|
times and/or use IPIs.</td>
|
||||||
|
<td>Can rely on RCU's dyntick-idle detection.</td>
|
||||||
|
<td>Can rely on RCU's dyntick-idle detection.</td></tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr><th> </th></tr>
|
||||||
|
<tr><th align="left">Quick Quiz:</th></tr>
|
||||||
|
<tr><td>
|
||||||
|
Why can't <tt>NO_HZ_FULL</tt> in-kernel execution rely on the
|
||||||
|
scheduling-clock interrupt, just like <tt>HZ_PERIODIC</tt>
|
||||||
|
and <tt>NO_HZ_IDLE</tt> do?
|
||||||
|
</td></tr>
|
||||||
|
<tr><th align="left">Answer:</th></tr>
|
||||||
|
<tr><td bgcolor="#ffffff"><font color="ffffff">
|
||||||
|
Because, as a performance optimization, <tt>NO_HZ_FULL</tt>
|
||||||
|
does not necessarily re-enable the scheduling-clock interrupt
|
||||||
|
on entry to each and every system call.
|
||||||
|
</font></td></tr>
|
||||||
|
<tr><td> </td></tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
However, RCU must be reliably informed as to whether any given
|
||||||
|
CPU is currently in the idle loop, and, for <tt>NO_HZ_FULL</tt>,
|
||||||
|
also whether that CPU is executing in usermode, as discussed
|
||||||
|
<a href="#Energy Efficiency">earlier</a>.
|
||||||
|
It also requires that the scheduling-clock interrupt be enabled when
|
||||||
|
RCU needs it to be:
|
||||||
|
|
||||||
|
<ol>
|
||||||
|
<li> If a CPU is either idle or executing in usermode, and RCU believes
|
||||||
|
it is non-idle, the scheduling-clock tick had better be running.
|
||||||
|
Otherwise, you will get RCU CPU stall warnings. Or at best,
|
||||||
|
very long (11-second) grace periods, with a pointless IPI waking
|
||||||
|
the CPU from time to time.
|
||||||
|
<li> If a CPU is in a portion of the kernel that executes RCU read-side
|
||||||
|
critical sections, and RCU believes this CPU to be idle, you will get
|
||||||
|
random memory corruption. <b>DON'T DO THIS!!!</b>
|
||||||
|
|
||||||
|
<br>This is one reason to test with lockdep, which will complain
|
||||||
|
about this sort of thing.
|
||||||
|
<li> If a CPU is in a portion of the kernel that is absolutely
|
||||||
|
positively no-joking guaranteed to never execute any RCU read-side
|
||||||
|
critical sections, and RCU believes this CPU to to be idle,
|
||||||
|
no problem. This sort of thing is used by some architectures
|
||||||
|
for light-weight exception handlers, which can then avoid the
|
||||||
|
overhead of <tt>rcu_irq_enter()</tt> and <tt>rcu_irq_exit()</tt>
|
||||||
|
at exception entry and exit, respectively.
|
||||||
|
Some go further and avoid the entireties of <tt>irq_enter()</tt>
|
||||||
|
and <tt>irq_exit()</tt>.
|
||||||
|
|
||||||
|
<br>Just make very sure you are running some of your tests with
|
||||||
|
<tt>CONFIG_PROVE_RCU=y</tt>, just in case one of your code paths
|
||||||
|
was in fact joking about not doing RCU read-side critical sections.
|
||||||
|
<li> If a CPU is executing in the kernel with the scheduling-clock
|
||||||
|
interrupt disabled and RCU believes this CPU to be non-idle,
|
||||||
|
and if the CPU goes idle (from an RCU perspective) every few
|
||||||
|
jiffies, no problem. It is usually OK for there to be the
|
||||||
|
occasional gap between idle periods of up to a second or so.
|
||||||
|
|
||||||
|
<br>If the gap grows too long, you get RCU CPU stall warnings.
|
||||||
|
<li> If a CPU is either idle or executing in usermode, and RCU believes
|
||||||
|
it to be idle, of course no problem.
|
||||||
|
<li> If a CPU is executing in the kernel, the kernel code
|
||||||
|
path is passing through quiescent states at a reasonable
|
||||||
|
frequency (preferably about once per few jiffies, but the
|
||||||
|
occasional excursion to a second or so is usually OK) and the
|
||||||
|
scheduling-clock interrupt is enabled, of course no problem.
|
||||||
|
|
||||||
|
<br>If the gap between a successive pair of quiescent states grows
|
||||||
|
too long, you get RCU CPU stall warnings.
|
||||||
|
</ol>
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr><th> </th></tr>
|
||||||
|
<tr><th align="left">Quick Quiz:</th></tr>
|
||||||
|
<tr><td>
|
||||||
|
But what if my driver has a hardware interrupt handler
|
||||||
|
that can run for many seconds?
|
||||||
|
I cannot invoke <tt>schedule()</tt> from an hardware
|
||||||
|
interrupt handler, after all!
|
||||||
|
</td></tr>
|
||||||
|
<tr><th align="left">Answer:</th></tr>
|
||||||
|
<tr><td bgcolor="#ffffff"><font color="ffffff">
|
||||||
|
One approach is to do <tt>rcu_irq_exit();rcu_irq_enter();</tt>
|
||||||
|
every so often.
|
||||||
|
But given that long-running interrupt handlers can cause
|
||||||
|
other problems, not least for response time, shouldn't you
|
||||||
|
work to keep your interrupt handler's runtime within reasonable
|
||||||
|
bounds?
|
||||||
|
</font></td></tr>
|
||||||
|
<tr><td> </td></tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>
|
||||||
|
But as long as RCU is properly informed of kernel state transitions between
|
||||||
|
in-kernel execution, usermode execution, and idle, and as long as the
|
||||||
|
scheduling-clock interrupt is enabled when RCU needs it to be, you
|
||||||
|
can rest assured that the bugs you encounter will be in some other
|
||||||
|
part of RCU or some other part of the kernel!
|
||||||
|
|
||||||
<h3><a name="Memory Efficiency">Memory Efficiency</a></h3>
|
<h3><a name="Memory Efficiency">Memory Efficiency</a></h3>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
|
Loading…
Reference in New Issue
Block a user